There are two categories of people eligible to participate in the Harappa Ancestry Project.
First, if you have any real ancestry from a South Asian origin, you are eligible to participate. Those with partial South Asian ancestry are also welcome. The list of countries of origin I count as South Asian are as follows:
- Afghanistan
- Bangladesh
- Bhutan
- India
- Maldives
- Nepal
- Pakistan
- Sri Lanka
Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.
Secondly, if you have all four of your grandparents from one of the following countries or regions, you can also send me your data.
- Burma
- Tibet
- Uyghur from Xinjiang, China
- Tajikistan
- Kyrgyzstan
- Kazakhstan
- Uzbekistan
- Turkmenistan
- Iran
- Turkey
- Azerbaijan
- Armenia
- Georgia
- North Caucasian Federal District, Russia
- Iraq
- Syria
- Lebanon
- Jordan
Everyone else can use DIY HarappaWorld to estimate their admixture results.
Right now, I am accepting raw data samples from people who have tested with 23andme or FTDNA Family Finder.
Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.
If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.
What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme or FTDNA to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.
Here's my privacy policy.





Lahori Punjabis coming your way!
Great!
I have sent my data to you,am of Jat ethnicity Haryana State, INDIA.
Dear Mr. Zack Ajmal,
I have come across your web site Harrapa Ancestry Project. I found it very interesting and want to know Haplo Group and Defining Marker of Arain sample you have studied. I would appreciate your prompt response.
Sincerely,
Aftab Malik
I haven't looked at the haplogroups for a while since I spend most of my time using autosomal data. So I'll have to look it up.
Thanks, I'll appreciate that. By the way did you study Arain genelogy, as mentioned on Arain Page, captioned Genetics, in Wikipedia ?
http://en.wikipedia.org/wiki/Arain
I have send my raw data. I am a Rajasthan Rajput.
Thanks
Hi Zack. I found a few errors in the ethnicity spreadsheet.
- HRP0054 (Bengali Brahmin) is grouped under the Bihari-Brahmin label.
- HRP0222 (Iyer) is grouped under the Iyengar-Brahmin label.
As an aside, it'd also be great, as discussed elsewhere (DNA-Forums), if HRP0106 and HRP0107 were labeled under a new group, Punjabi Rajput, including the Pahari Rajput participant (HRP0135).
Thanks, fixed the errors.
Sent you a email (Mar 26, 2012 EST 1:35 AM) with 124andme data and I have a feeling it got caught in spam filter.
Had sent a previous email too Mar 14, 2012 EST 10:52 PM
The data is also available here
Plan to use a lot of info from you and Razibs's sites to put as much as my info in public domain.
cheers
Hi Zack. Just informed a Sindhi Pushkarana Brahmin individual about your project - expect their raw-data in your inbox!
Zack,
The gujarati_a (Hapmap) sample you are using in the Project breaks your own Project rules because that cluster is a bunch of close relatives.
You stipulate this clause in your Project rules:
''Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.''
The fact that you continue to use the Hapmap Gujaratis without:
1. Being unable to source and identify what caste those Gujaratis are, and;
2. Use the samples knowing full well that gujarati_a are closely related and form an outgroup to everyone else -
- is extremely unscientific of you, and in fact discredits the legitimacy of the Harappa Project. Without properly sourcing these samples you lose intellectual credibility, and thus this Project comes across as a charade that is not more than akin to someone's pastime hobby.
Either you should take off the Hapmap gujarati_a because it has already been substantiated by Razib Khan himself that this cluster are close relatives, or you must add a disclaimer acknowledging that this group is not wholly representative of the Gujaratis, but do redeem yourself.
I await a response from you.
Please provide me with the link where Razib has declared the HapMap Gujarati-a to be close relatives. They are definitely related, but are not close relatives, as is obvious to anyone who has analyzed them.
I should point out that this project is definitely a hobby for me.
I think Razib found a couple of them to be related.
"the definitely related individuals seem to be in the Gujarati_A cluster!"
To which you and John Hawks responded:
"I removed NA20900 and NA20909 from my datasets due to they being too similar to other samples. I used simple IBS similarity. Their IBD numbers are really weird though."
"I started working with the Gujarati sample a couple of years ago and went to talk to a South Asianist about it. His immediate reaction when I described the sample was that it would give all kinds of weird results because the Houston Gujarati community is really uncharacteristic in many ways, mostly Brahmins from the same areas and therefore full of distant relatives."
http://blogs.discovermagazine.com/gnxp/2011/03/looking-for-relatedness-in-the-hapmap-gujaratis/#.URQFjR1Ii2w
Though Razib's Gujarati_A is your gujaratis-b and that may be causing a confusion here.
@Parasar
A colleague of mine recently got in touch with John Hawkes and asked him to substantiate that comment he made two years ago, but he said he could not and that it was based on his friend's conjecture.
Zack,
Do you have a legitimate source for the individual admixtures of the Gujarati_a Hapmap? I can't seem to access this on the spreadsheet to be able to compare them with the Harappa equivalent participants.
You are surely correct that the Gujarati_a are not purportedly close relatives, however they are related individuals (from some hitherto unknown caste that forum scientists can only postulate and make grand inferences about... which is highly unprofessional in my view. One cannot really know for certainty that the Gujarati_a from Houston are Patels, and thus there is the real danger of presenting grossly misleading factual data from the near assumption that two Patels who have participated on the Project supposedly have similar admixture. In other words:
1. You can't be 100% sure that the Gujarati_a are Patels - but given that two Patel participants have similar admixtures, this is good enough proof for you.
2. I hope you know there is no one caste of Patels; they are sub-divided into dozens of highly endogamous clans who have historically favoured hypergamy with the emphasis on socially competitive gols, and also practice a form of caste hierarchy among themselves. In other words, there are upper-caste Patels, and then lower-caste Patels.
The issue I am raising thus is that, I think it is rather unfair of you to use the Gujarati Hapmap samples without making a disclaimer that there is some controversy surrounding the origin and assessment of the samples. There are people who look at these samples on anthropology forums and think they are wholly representative of Gujaratis (which is again unfair, because the Gujarati_ a are a tight cluster of related individuals) and given the level of endogamy they practice, they won't be so genetically close to other Gujarati castes either. Believe it or not but I've had arguments with other South Asians about this because they wrongly assumed that the Gujarati_a are wholly representative of the whole Gujarati populace.
Even if, for one moment, we were to believe that the Gujarati_a are Patels, they would still not be wholly representative of Gujaratis as a whole because back in Gujarat they constitute not more than approx. 20% of the populace!
There is no controversy regarding the HapMap Gujaratis. It is well known among academic researchers and genome bloggers that about two-thirds of the samples, those I label Gujarati-a, are an extremely endogamous group.
No one has said that these HapMap Gujaratis represent all of Gujarat. Same as no one should think that the HGDP Sindhis represent all of Sindh. While those Sindhis are labeled only as Sindhis, it is clear to anyone who has analyzed their data and knows Sindh that there are a few Balochis settled in Sindh included as well as a few with partial African ancestry.
@Parasar
Razib Khan has this to say about his Gujarati_a (Zack's Gujarati_b):
"Gujarat_A has some individuals with much more “West Eurasian” ancestry"
http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk
Still, that "tight" cluster I'm talking about is Razib's Gujarati_b (or Zack's Gujarati_a) - this is the one which forms an outgroup to everyone else and the autosomal admixtures for these individuals seems consistent:
"Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set"
http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk
Then Razib says this:
"But my guess is that Gujarati_B are a subset of Patels. In other words, they’re a genetically distinct jati. I suspect that Gujarati_A are a more diverse bunch from a number of different jatis."
Then he changes his tune and goes on to say this later:
"In any case, to my surprise the definitely related individuals seem to be in the Gujarati_A cluster!"
Which is strange because Zack's Gujarati_b individuals differ in their autosomal admixture components quite a bit. In other words, both Gujarati_a and Gujarati_b are two distinct jatis, which means neither group could be wholly representative of Gujaratis. It's the "tight" cluster group which gives off weirdo results...
Hi Jack did my own DNA some time back, But decided to have my parents done this year and they both have come in. I had some oddities showing up in my DNA and this why I ran theirs. When you Iran, Turkey, and so on. You wonder why. Could run theirs for me.
Robin