There are two categories of people eligible to participate in the Harappa Ancestry Project.

First, if you have any real ancestry from a South Asian origin (except for Romany/Roma/Gypsy ancestry), you are eligible to participate. Those with partial South Asian ancestry are also welcome. The list of countries of origin I count as South Asian are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • India
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

Secondly, if you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

  • Burma
  • Tibet
  • Uyghur from Xinjiang, China
  • Tajikistan
  • Kyrgyzstan
  • Kazakhstan
  • Uzbekistan
  • Turkmenistan
  • Iran
  • Turkey
  • Azerbaijan
  • Armenia
  • Georgia
  • North Caucasian Federal District, Russia
  • Iraq
  • Syria
  • Lebanon
  • Jordan

Everyone else can use DIY HarappaWorld to estimate their admixture results.

Right now, I am accepting raw data samples from people who have tested with 23andme, FTDNA Family Finder, or DNA.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email ( to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme or FTDNA to along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Here's my privacy policy.

Leave a comment ?


  1. 23andme v3 Data | Harappa Ancestry Project - pingback on January 25, 2011 at 8:01 am
  2. Gene Expression » Notes on the future - pingback on January 26, 2011 at 11:56 pm
  3. Harappa Ancestry Project @ N ~ 50 | Biology News by Biologged - pingback on March 13, 2011 at 2:32 am
  4. Accepting FTDNA Family Finder | Harappa Ancestry Project - pingback on May 3, 2011 at 12:01 am
  5. Lahori Punjabis coming your way!

  6. I have sent my data to you,am of Jat ethnicity Haryana State, INDIA.

  7. Dear Mr. Zack Ajmal,

    I have come across your web site Harrapa Ancestry Project. I found it very interesting and want to know Haplo Group and Defining Marker of Arain sample you have studied. I would appreciate your prompt response.


    Aftab Malik

  8. Thanks, I'll appreciate that. By the way did you study Arain genelogy, as mentioned on Arain Page, captioned Genetics, in Wikipedia ?

  9. I have send my raw data. I am a Rajasthan Rajput.

  10. Hi Zack. I found a few errors in the ethnicity spreadsheet.

    - HRP0054 (Bengali Brahmin) is grouped under the Bihari-Brahmin label.
    - HRP0222 (Iyer) is grouped under the Iyengar-Brahmin label.

    As an aside, it'd also be great, as discussed elsewhere (DNA-Forums), if HRP0106 and HRP0107 were labeled under a new group, Punjabi Rajput, including the Pahari Rajput participant (HRP0135).

  11. Sent you a email (Mar 26, 2012 EST 1:35 AM) with 124andme data and I have a feeling it got caught in spam filter.
    Had sent a previous email too Mar 14, 2012 EST 10:52 PM

    The data is also available here

    Plan to use a lot of info from you and Razibs's sites to put as much as my info in public domain.


  12. Hi Zack. Just informed a Sindhi Pushkarana Brahmin individual about your project - expect their raw-data in your inbox!

  13. Zack,

    The gujarati_a (Hapmap) sample you are using in the Project breaks your own Project rules because that cluster is a bunch of close relatives.

    You stipulate this clause in your Project rules:

    ''Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.''

    The fact that you continue to use the Hapmap Gujaratis without:

    1. Being unable to source and identify what caste those Gujaratis are, and;

    2. Use the samples knowing full well that gujarati_a are closely related and form an outgroup to everyone else -

    - is extremely unscientific of you, and in fact discredits the legitimacy of the Harappa Project. Without properly sourcing these samples you lose intellectual credibility, and thus this Project comes across as a charade that is not more than akin to someone's pastime hobby.

    Either you should take off the Hapmap gujarati_a because it has already been substantiated by Razib Khan himself that this cluster are close relatives, or you must add a disclaimer acknowledging that this group is not wholly representative of the Gujaratis, but do redeem yourself.

    I await a response from you.

    • Please provide me with the link where Razib has declared the HapMap Gujarati-a to be close relatives. They are definitely related, but are not close relatives, as is obvious to anyone who has analyzed them.

      this Project comes across as a charade that is not more than akin to someone's pastime hobby.

      I should point out that this project is definitely a hobby for me.

      • I think Razib found a couple of them to be related.
        "the definitely related individuals seem to be in the Gujarati_A cluster!"

        To which you and John Hawks responded:

        "I removed NA20900 and NA20909 from my datasets due to they being too similar to other samples. I used simple IBS similarity. Their IBD numbers are really weird though."

        "I started working with the Gujarati sample a couple of years ago and went to talk to a South Asianist about it. His immediate reaction when I described the sample was that it would give all kinds of weird results because the Houston Gujarati community is really uncharacteristic in many ways, mostly Brahmins from the same areas and therefore full of distant relatives."

        Though Razib's Gujarati_A is your gujaratis-b and that may be causing a confusion here.

        • @Parasar

          A colleague of mine recently got in touch with John Hawkes and asked him to substantiate that comment he made two years ago, but he said he could not and that it was based on his friend's conjecture.

  14. Zack,

    Do you have a legitimate source for the individual admixtures of the Gujarati_a Hapmap? I can't seem to access this on the spreadsheet to be able to compare them with the Harappa equivalent participants.

    You are surely correct that the Gujarati_a are not purportedly close relatives, however they are related individuals (from some hitherto unknown caste that forum scientists can only postulate and make grand inferences about... which is highly unprofessional in my view. One cannot really know for certainty that the Gujarati_a from Houston are Patels, and thus there is the real danger of presenting grossly misleading factual data from the near assumption that two Patels who have participated on the Project supposedly have similar admixture. In other words:

    1. You can't be 100% sure that the Gujarati_a are Patels - but given that two Patel participants have similar admixtures, this is good enough proof for you.

    2. I hope you know there is no one caste of Patels; they are sub-divided into dozens of highly endogamous clans who have historically favoured hypergamy with the emphasis on socially competitive gols, and also practice a form of caste hierarchy among themselves. In other words, there are upper-caste Patels, and then lower-caste Patels.

    The issue I am raising thus is that, I think it is rather unfair of you to use the Gujarati Hapmap samples without making a disclaimer that there is some controversy surrounding the origin and assessment of the samples. There are people who look at these samples on anthropology forums and think they are wholly representative of Gujaratis (which is again unfair, because the Gujarati_ a are a tight cluster of related individuals) and given the level of endogamy they practice, they won't be so genetically close to other Gujarati castes either. Believe it or not but I've had arguments with other South Asians about this because they wrongly assumed that the Gujarati_a are wholly representative of the whole Gujarati populace.

    Even if, for one moment, we were to believe that the Gujarati_a are Patels, they would still not be wholly representative of Gujaratis as a whole because back in Gujarat they constitute not more than approx. 20% of the populace!

    • There is no controversy regarding the HapMap Gujaratis. It is well known among academic researchers and genome bloggers that about two-thirds of the samples, those I label Gujarati-a, are an extremely endogamous group.

      No one has said that these HapMap Gujaratis represent all of Gujarat. Same as no one should think that the HGDP Sindhis represent all of Sindh. While those Sindhis are labeled only as Sindhis, it is clear to anyone who has analyzed their data and knows Sindh that there are a few Balochis settled in Sindh included as well as a few with partial African ancestry.

  15. @Parasar

    Razib Khan has this to say about his Gujarati_a (Zack's Gujarati_b):

    "Gujarat_A has some individuals with much more “West Eurasian” ancestry"

    Still, that "tight" cluster I'm talking about is Razib's Gujarati_b (or Zack's Gujarati_a) - this is the one which forms an outgroup to everyone else and the autosomal admixtures for these individuals seems consistent:

    "Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set"

    Then Razib says this:

    "But my guess is that Gujarati_B are a subset of Patels. In other words, they’re a genetically distinct jati. I suspect that Gujarati_A are a more diverse bunch from a number of different jatis."

    Then he changes his tune and goes on to say this later:

    "In any case, to my surprise the definitely related individuals seem to be in the Gujarati_A cluster!"

    Which is strange because Zack's Gujarati_b individuals differ in their autosomal admixture components quite a bit. In other words, both Gujarati_a and Gujarati_b are two distinct jatis, which means neither group could be wholly representative of Gujaratis. It's the "tight" cluster group which gives off weirdo results...

  16. Hi Jack did my own DNA some time back, But decided to have my parents done this year and they both have come in. I had some oddities showing up in my DNA and this why I ran theirs. When you Iran, Turkey, and so on. You wonder why. Could run theirs for me.

  17. need to know why I get following error:

    12 ancestral populations
    166462 total SNPs
    At line 142 of file DIYDodecad.f90 (Unit 50 "genotype.txt")
    Traceback: not available, compile with -ftrace=frame or -ftrace=full
    Fortran runtime error: End of file

  18. need to know if anyone came across following error during standardize step:
    Error in : object 'X' not found

  19. any replies to my prior posts appreciated.

  20. Any thoughts on using the National Geographic Genographic 2.0 project service? They claim to test for a wider range of markers that are optimized for ancestry studies. If I do use their service, would the data be in a usable format for your project? Thanks.

  21. Hello, my brother-in-law is Vietnamese and has 4 grandparents from Vietnam. Would you be interested in his data? I noticed on one of the spreadsheets Vietnamese was mentioned, but I wasn't sure if there was a need for the data or not. If so, I'll let him know! Thank you.

  22. Hello, I also had a question regarding the GEDmatch Harappa test. According to it, I have: S-Indian 0.63% --- Baloch 8.50% -- Caucasian 8.85% --- SW-Asian 1.61% (with European the most predominant: NE-Euro 49.52% ----Mediterranean 29.77%).

    On the Dodecad K12b test, I also get 7.01% for Gedrosia, which I believe means Balochistan, which is the same or similar to Baloch? Does this indicate that I have South Asian ancestry and if so, could I be help to the project?

    I don't know if it's helpful, but my mtDNA is U4c1.

    My brother who was also tested with 23&me, in the Harappa test gets slightly higher numbers: S-Indian 0.90% ----Baloch 9.34% ---Caucasian 9.34% ---SW-Asian 2.21%---Siberian 1.43%

    I'm not entirely sure what this indicates, but I'd love to find out. Does anyone have any ideas? Or does it not amount to much of anything? (Or at least, what does 8-10% mean in regard to generations away from me?)

    • The names of these admixture components are just mnemonics based on which group a component is highest in. Also these components are not some pure ancestry. Having 5% Baloch for example doesn't mean you have Baloch ancestry.

      The best way to look at your results is to compare them with the various group averages.

      • I see. Thank you for the quick reply. I'm new to this stuff and looking at all the admixture models often confuses me. My family is trying to discover the mystery of my father's ancestry as we were told he was Native American (and skin-color wise he looks it), but for my brother and I, no NA turned up. So, we were suspecting other origins, even the Roma (and looking at the Baloch people it could have fit as well), but it's hard trying to see anything in the GEDmatch admixture profiles that makes any sense, at least under mine and my brother's profile. He's going to be tested soon.

  23. Welcome | South Asian Ancestry - pingback on February 1, 2014 at 3:40 pm
  24. AncestryDNA | Harappa Ancestry Project - pingback on February 13, 2014 at 1:09 pm
  25. If I uploaded by data to GedMatch would you have access to it or would I have to submit it via email? Would the data I submit affect my results on Gedmatch or no?

    Thank you

  26. hello
    i sent you my data in 2013 and you sent me back id but still now u didnt send me password my id is hrp0355.

  27. jamwal dogra rajput

    I was just curious- did you ever have any participants from Jammu or the Northern Himachal (Chamba, Kangra) region?

  28. I am kit hrp299 and am a punjabi Jatt and on the the participants ethnicity sheet the kit hrp299 is shown as some bengali which is a gross error. I think that there is no need of separate sheets for haplogroup and ethnicity. Column B on the haplogroup sheet can be used for ethnicity to avoid such errors. Can you please correctly match the ethnicity to each kit and add it on column B of haplogroup sheet?

    • Because anyone can edit the haplogroup spreadsheet, it's been a problem keeping it correct. Therefore, I am no longer using it. The ethnicity spreadsheet linked on the right sidebar of this blog shows your ethnicity correctly.

  29. hello I was wondering If I meet what ur looking for GEDmatch number a529919

  30. Hello,

    All 4 of my grandparents were born in Iran. Are you still accepting data from 23andme?

    Also, if I send in data, would I be able to see my own results afterward (your project seems to be more specific than 23andme, so I am hoping I can learn more by sending data in).

  31. Zack, for a participant that has one grandparent from Afghanistan and the rest whom are Arabic, do you consider the participant to fall under the Afghani category or Arab (Maghrebi) or some other category (mixed, born and living in America)?

  32. Hey Zack,

    Have data for myself, Bengali Muslim and an interested friend too, Punjabi Muslim. Are you still accepting data? If so, can zip across the files..

  33. i was able to see my results by using the harappaworld calculator through gedmatch. thank you for creating it.

    how did you come up with your reference populations? are they based on human settlement 8-10000 years ago? are they somewhat arbitrary?

  34. I used the Harappa world calculator thru GED match and I am beyond surprised at some result .From everything I was told everyone came out of Ireland .I know inky 23&me they showed Mediterranean , European,south Asia,south pacific ,Native American,South African .I have noticed on . I was matching to people of South Pacific background with no Irish in them as very distant cousins .

    S-Indian -
    Baloch 10.05
    Caucasian 4.29
    NE-Euro 50.17
    SE-Asian -
    Siberian -
    NE-Asian -
    Papuan 0.17
    American 0.16
    Beringian 0.71
    Mediterranean 34.39
    SW-Asian -
    San 0.06
    E-African -
    Pygmy -
    W-African -

  35. Have you ever found an African-American with no idea where his South Asian grandparent came from?

  36. So a Kerala-Malayali is 42% S. Indian, 39% Baloch, 7% caucasian 3% SW Asian

  37. Still accepting Nepali's?

    Think you should also categorize Brahmin and Chettris seperately if you have not already since Chettris have a higher Mongolian component. Same goes with Maithali Brahmins and Hill Brahmins, two seperate regions and people.

  38. Hi. Name is Sam. Just bought a kit from 23andme. My paternal Grandparents are from Tarn Taran who settled in Renala Khurd (85 km from HARAPPA). My mom is from Nankana Saab. Interested to provide you my RAW data post Christmas

  39. Hey man! I emailed you my dna and a brief history of my known ancestry as well! Would be sweet if you can reply man, im very interested in taking part of this project!

  40. Hi Zack
    I'm curious as to the accuracy of the admixture analysis that GEDmatch/HarappaWorld provides. It tells me I'm Kashmiri in both single and two population approximations (this is accurate enough as my family have always said we're Kashmiri. We're from Azad/Pak Kashmir as well). What is the source of the Kashmiri DNA data labelled "harappa" and "reich" in the results? Is it from studies involving Kashmir Valley individuals?

    S-Indian 32.22
    Baloch 37.41
    Caucasian 14.54
    NE-Euro 10.39
    SE-Asian 0.34
    Siberian 1.51
    NE-Asian 1.61
    These are the results of my 23andme upload (missed the negligibles). I'm confused about Baloch and Caucasian percentages. Are they based on ancient baloch/caucasian DNA? What specific details can I get from those segments if any?


    • My results are similar to yours. Was always told we had Kashmiri in us.

      1 Baloch 35.64
      2 S-Indian 34.28
      3 Caucasian 15.83
      4 NE-Euro 7.72
      5 Siberian 2.38
      6 NE-Asian 1.6
      7 SW-Asian 0.69
      8 American 0.65
      9 Papuan 0.58
      10 San 0.37
      11 W-African 0.19
      12 Pygmy 0.08

  41. Hello. I have emailed you my data and ancestral information. Hope you can help me out. Thanks!

  42. Thanks for your hard work. I am a Marthoma Christian from Pathanamthitta district,Kerala and my results are 50% S. Indian, 37% Baloch, 5% Caucasian, etc. Is that pretty normal when compared to the rest of the Kerala People/ other South Indian states? Thanks

  43. A Brahui person from both paternal and maternal side considering getting a 23andme DNA test if it would be helpful.

  44. I belong to Bengali Namasudra community. I have emailed you my 23andme raw data. Hope you can help me out and thanks a ton for your efforts.

  45. Muhammad aqib niazai

    Do you have any results from niazi pashtuns,khattaks and bansgash pashtuns and can i send you results from any lab in pakistan. Because 23 and me is not providing services in pakistan.

  46. Hey Zack,

    I sent you the raw DNA file I got from 23andMe and would love to hear your thoughts!

Leave a Reply

Trackbacks and Pingbacks: