Participation

There are two categories of people eligible to participate in the Harappa Ancestry Project.

First, if you have any real ancestry from a South Asian origin (except for Romany/Roma/Gypsy ancestry), you are eligible to participate. Those with partial South Asian ancestry are also welcome. The list of countries of origin I count as South Asian are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • India
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

Secondly, if you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

  • Burma
  • Tibet
  • Uyghur from Xinjiang, China
  • Tajikistan
  • Kyrgyzstan
  • Kazakhstan
  • Uzbekistan
  • Turkmenistan
  • Iran
  • Turkey
  • Azerbaijan
  • Armenia
  • Georgia
  • North Caucasian Federal District, Russia
  • Iraq
  • Syria
  • Lebanon
  • Jordan

Everyone else can use DIY HarappaWorld to estimate their admixture results.

Right now, I am accepting raw data samples from people who have tested with 23andme, FTDNA Family Finder, or ancestry.com DNA.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme or FTDNA to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Here's my privacy policy.

Leave a comment ?

37 Comments.

  1. 23andme v3 Data | Harappa Ancestry Project - pingback on January 25, 2011 at 8:01 am
  2. Gene Expression » Notes on the future - pingback on January 26, 2011 at 11:56 pm
  3. Harappa Ancestry Project @ N ~ 50 | Biology News by Biologged - pingback on March 13, 2011 at 2:32 am
  4. Accepting FTDNA Family Finder | Harappa Ancestry Project - pingback on May 3, 2011 at 12:01 am
  5. Lahori Punjabis coming your way!

  6. I have sent my data to you,am of Jat ethnicity Haryana State, INDIA.

  7. Dear Mr. Zack Ajmal,

    I have come across your web site Harrapa Ancestry Project. I found it very interesting and want to know Haplo Group and Defining Marker of Arain sample you have studied. I would appreciate your prompt response.

    Sincerely,

    Aftab Malik

  8. Thanks, I'll appreciate that. By the way did you study Arain genelogy, as mentioned on Arain Page, captioned Genetics, in Wikipedia ?
    http://en.wikipedia.org/wiki/Arain

  9. I have send my raw data. I am a Rajasthan Rajput.

  10. Hi Zack. I found a few errors in the ethnicity spreadsheet.

    - HRP0054 (Bengali Brahmin) is grouped under the Bihari-Brahmin label.
    - HRP0222 (Iyer) is grouped under the Iyengar-Brahmin label.

    As an aside, it'd also be great, as discussed elsewhere (DNA-Forums), if HRP0106 and HRP0107 were labeled under a new group, Punjabi Rajput, including the Pahari Rajput participant (HRP0135).

  11. Sent you a email (Mar 26, 2012 EST 1:35 AM) with 124andme data and I have a feeling it got caught in spam filter.
    Had sent a previous email too Mar 14, 2012 EST 10:52 PM

    The data is also available here

    Plan to use a lot of info from you and Razibs's sites to put as much as my info in public domain.

    cheers

  12. Hi Zack. Just informed a Sindhi Pushkarana Brahmin individual about your project - expect their raw-data in your inbox!

  13. Zack,

    The gujarati_a (Hapmap) sample you are using in the Project breaks your own Project rules because that cluster is a bunch of close relatives.

    You stipulate this clause in your Project rules:

    ''Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.''

    The fact that you continue to use the Hapmap Gujaratis without:

    1. Being unable to source and identify what caste those Gujaratis are, and;

    2. Use the samples knowing full well that gujarati_a are closely related and form an outgroup to everyone else -

    - is extremely unscientific of you, and in fact discredits the legitimacy of the Harappa Project. Without properly sourcing these samples you lose intellectual credibility, and thus this Project comes across as a charade that is not more than akin to someone's pastime hobby.

    Either you should take off the Hapmap gujarati_a because it has already been substantiated by Razib Khan himself that this cluster are close relatives, or you must add a disclaimer acknowledging that this group is not wholly representative of the Gujaratis, but do redeem yourself.

    I await a response from you.

    • Please provide me with the link where Razib has declared the HapMap Gujarati-a to be close relatives. They are definitely related, but are not close relatives, as is obvious to anyone who has analyzed them.

      this Project comes across as a charade that is not more than akin to someone's pastime hobby.

      I should point out that this project is definitely a hobby for me.

      • I think Razib found a couple of them to be related.
        "the definitely related individuals seem to be in the Gujarati_A cluster!"

        To which you and John Hawks responded:

        "I removed NA20900 and NA20909 from my datasets due to they being too similar to other samples. I used simple IBS similarity. Their IBD numbers are really weird though."

        "I started working with the Gujarati sample a couple of years ago and went to talk to a South Asianist about it. His immediate reaction when I described the sample was that it would give all kinds of weird results because the Houston Gujarati community is really uncharacteristic in many ways, mostly Brahmins from the same areas and therefore full of distant relatives."
        http://blogs.discovermagazine.com/gnxp/2011/03/looking-for-relatedness-in-the-hapmap-gujaratis/#.URQFjR1Ii2w

        Though Razib's Gujarati_A is your gujaratis-b and that may be causing a confusion here.

        • @Parasar

          A colleague of mine recently got in touch with John Hawkes and asked him to substantiate that comment he made two years ago, but he said he could not and that it was based on his friend's conjecture.

  14. Zack,

    Do you have a legitimate source for the individual admixtures of the Gujarati_a Hapmap? I can't seem to access this on the spreadsheet to be able to compare them with the Harappa equivalent participants.

    You are surely correct that the Gujarati_a are not purportedly close relatives, however they are related individuals (from some hitherto unknown caste that forum scientists can only postulate and make grand inferences about... which is highly unprofessional in my view. One cannot really know for certainty that the Gujarati_a from Houston are Patels, and thus there is the real danger of presenting grossly misleading factual data from the near assumption that two Patels who have participated on the Project supposedly have similar admixture. In other words:

    1. You can't be 100% sure that the Gujarati_a are Patels - but given that two Patel participants have similar admixtures, this is good enough proof for you.

    2. I hope you know there is no one caste of Patels; they are sub-divided into dozens of highly endogamous clans who have historically favoured hypergamy with the emphasis on socially competitive gols, and also practice a form of caste hierarchy among themselves. In other words, there are upper-caste Patels, and then lower-caste Patels.

    The issue I am raising thus is that, I think it is rather unfair of you to use the Gujarati Hapmap samples without making a disclaimer that there is some controversy surrounding the origin and assessment of the samples. There are people who look at these samples on anthropology forums and think they are wholly representative of Gujaratis (which is again unfair, because the Gujarati_ a are a tight cluster of related individuals) and given the level of endogamy they practice, they won't be so genetically close to other Gujarati castes either. Believe it or not but I've had arguments with other South Asians about this because they wrongly assumed that the Gujarati_a are wholly representative of the whole Gujarati populace.

    Even if, for one moment, we were to believe that the Gujarati_a are Patels, they would still not be wholly representative of Gujaratis as a whole because back in Gujarat they constitute not more than approx. 20% of the populace!

    • There is no controversy regarding the HapMap Gujaratis. It is well known among academic researchers and genome bloggers that about two-thirds of the samples, those I label Gujarati-a, are an extremely endogamous group.

      No one has said that these HapMap Gujaratis represent all of Gujarat. Same as no one should think that the HGDP Sindhis represent all of Sindh. While those Sindhis are labeled only as Sindhis, it is clear to anyone who has analyzed their data and knows Sindh that there are a few Balochis settled in Sindh included as well as a few with partial African ancestry.

  15. @Parasar

    Razib Khan has this to say about his Gujarati_a (Zack's Gujarati_b):

    "Gujarat_A has some individuals with much more “West Eurasian” ancestry"
    http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk

    Still, that "tight" cluster I'm talking about is Razib's Gujarati_b (or Zack's Gujarati_a) - this is the one which forms an outgroup to everyone else and the autosomal admixtures for these individuals seems consistent:

    "Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set"
    http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk

    Then Razib says this:

    "But my guess is that Gujarati_B are a subset of Patels. In other words, they’re a genetically distinct jati. I suspect that Gujarati_A are a more diverse bunch from a number of different jatis."

    Then he changes his tune and goes on to say this later:

    "In any case, to my surprise the definitely related individuals seem to be in the Gujarati_A cluster!"

    Which is strange because Zack's Gujarati_b individuals differ in their autosomal admixture components quite a bit. In other words, both Gujarati_a and Gujarati_b are two distinct jatis, which means neither group could be wholly representative of Gujaratis. It's the "tight" cluster group which gives off weirdo results...

  16. Hi Jack did my own DNA some time back, But decided to have my parents done this year and they both have come in. I had some oddities showing up in my DNA and this why I ran theirs. When you Iran, Turkey, and so on. You wonder why. Could run theirs for me.
    Robin

  17. need to know why I get following error:

    12 ancestral populations
    166462 total SNPs
    At line 142 of file DIYDodecad.f90 (Unit 50 "genotype.txt")
    Traceback: not available, compile with -ftrace=frame or -ftrace=full
    Fortran runtime error: End of file

  18. need to know if anyone came across following error during standardize step:
    Error in is.data.frame(x) : object 'X' not found

  19. any replies to my prior posts appreciated.

  20. Any thoughts on using the National Geographic Genographic 2.0 project service? They claim to test for a wider range of markers that are optimized for ancestry studies. If I do use their service, would the data be in a usable format for your project? Thanks.

  21. Hello, my brother-in-law is Vietnamese and has 4 grandparents from Vietnam. Would you be interested in his data? I noticed on one of the spreadsheets Vietnamese was mentioned, but I wasn't sure if there was a need for the data or not. If so, I'll let him know! Thank you.

  22. Hello, I also had a question regarding the GEDmatch Harappa test. According to it, I have: S-Indian 0.63% --- Baloch 8.50% -- Caucasian 8.85% --- SW-Asian 1.61% (with European the most predominant: NE-Euro 49.52% ----Mediterranean 29.77%).

    On the Dodecad K12b test, I also get 7.01% for Gedrosia, which I believe means Balochistan, which is the same or similar to Baloch? Does this indicate that I have South Asian ancestry and if so, could I be help to the project?

    I don't know if it's helpful, but my mtDNA is U4c1.

    My brother who was also tested with 23&me, in the Harappa test gets slightly higher numbers: S-Indian 0.90% ----Baloch 9.34% ---Caucasian 9.34% ---SW-Asian 2.21%---Siberian 1.43%

    I'm not entirely sure what this indicates, but I'd love to find out. Does anyone have any ideas? Or does it not amount to much of anything? (Or at least, what does 8-10% mean in regard to generations away from me?)

    • The names of these admixture components are just mnemonics based on which group a component is highest in. Also these components are not some pure ancestry. Having 5% Baloch for example doesn't mean you have Baloch ancestry.

      The best way to look at your results is to compare them with the various group averages.

      • I see. Thank you for the quick reply. I'm new to this stuff and looking at all the admixture models often confuses me. My family is trying to discover the mystery of my father's ancestry as we were told he was Native American (and skin-color wise he looks it), but for my brother and I, no NA turned up. So, we were suspecting other origins, even the Roma (and looking at the Baloch people it could have fit as well), but it's hard trying to see anything in the GEDmatch admixture profiles that makes any sense, at least under mine and my brother's profile. He's going to be tested soon.

  23. Welcome | South Asian Ancestry - pingback on February 1, 2014 at 3:40 pm
  24. AncestryDNA | Harappa Ancestry Project - pingback on February 13, 2014 at 1:09 pm

Leave a Reply

Trackbacks and Pingbacks: