Participation | Harappa Ancestry Project

Menu

Participation

Leave a comment (70) Go to comments

There are two categories of people eligible to participate in the Harappa Ancestry Project.

First, if you have any real ancestry from a South Asian origin (except for Romany/Roma/Gypsy ancestry), you are eligible to participate. Those with partial South Asian ancestry are also welcome. The list of countries of origin I count as South Asian are as follows:

Afghanistan
Bangladesh
Bhutan
India
Maldives
Nepal
Pakistan
Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

Secondly, if you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

Burma
Tibet
Uyghur from Xinjiang, China
Tajikistan
Kyrgyzstan
Kazakhstan
Uzbekistan
Turkmenistan
Iran
Turkey
Azerbaijan
Armenia
Georgia
North Caucasian Federal District, Russia
Iraq
Syria
Lebanon
Jordan

Everyone else can use DIY HarappaWorld to estimate their admixture results.

Right now, I am accepting raw data samples from people who have tested with 23andme, FTDNA Family Finder, or ancestry.com DNA.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme or FTDNA to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Here's my privacy policy.

Leave a comment ?

70 Comments.

23andme v3 Data | Harappa Ancestry Project - pingback on January 25, 2011 at 8:01 am
Gene Expression » Notes on the future - pingback on January 26, 2011 at 11:56 pm
Harappa Ancestry Project @ N ~ 50 | Biology News by Biologged - pingback on March 13, 2011 at 2:32 am
Accepting FTDNA Family Finder | Harappa Ancestry Project - pingback on May 3, 2011 at 12:01 am
{ Brown Pundits } » 100+ participants in the Harappa Ancestry Project - pingback on May 3, 2011 at 4:41 pm
Vasishta May 11, 2011 at 12:33 am

Lahori Punjabis coming your way!

Reply
- Zack May 11, 2011 at 7:08 am
  
  Great!
  
  Reply
Vikram Malik September 4, 2011 at 1:28 am

I have sent my data to you,am of Jat ethnicity Haryana State, INDIA.

Reply
Aftab Malik December 7, 2011 at 4:18 am

Dear Mr. Zack Ajmal,

I have come across your web site Harrapa Ancestry Project. I found it very interesting and want to know Haplo Group and Defining Marker of Arain sample you have studied. I would appreciate your prompt response.

Sincerely,

Aftab Malik

Reply
- Zack December 8, 2011 at 8:38 pm
  
  I haven't looked at the haplogroups for a while since I spend most of my time using autosomal data. So I'll have to look it up.
  
  Reply
Aftab Malik December 9, 2011 at 7:22 am

Thanks, I'll appreciate that. By the way did you study Arain genelogy, as mentioned on Arain Page, captioned Genetics, in Wikipedia ?
http://en.wikipedia.org/wiki/Arain

Reply
P Rajput March 7, 2012 at 9:27 pm

I have send my raw data. I am a Rajasthan Rajput.

Reply
- Zack March 9, 2012 at 8:10 am
  
  Thanks
  
  Reply
AV March 15, 2012 at 1:43 pm

Hi Zack. I found a few errors in the ethnicity spreadsheet.

- HRP0054 (Bengali Brahmin) is grouped under the Bihari-Brahmin label.
- HRP0222 (Iyer) is grouped under the Iyengar-Brahmin label.

As an aside, it'd also be great, as discussed elsewhere (DNA-Forums), if HRP0106 and HRP0107 were labeled under a new group, Punjabi Rajput, including the Pahari Rajput participant (HRP0135).

Reply
- Zack March 15, 2012 at 2:17 pm
  
  Thanks, fixed the errors.
  
  Reply
sbarrkum March 26, 2012 at 10:54 am

Sent you a email (Mar 26, 2012 EST 1:35 AM) with 124andme data and I have a feeling it got caught in spam filter.
Had sent a previous email too Mar 14, 2012 EST 10:52 PM

The data is also available here

Plan to use a lot of info from you and Razibs's sites to put as much as my info in public domain.

cheers

Reply
AV April 15, 2012 at 7:14 am

Hi Zack. Just informed a Sindhi Pushkarana Brahmin individual about your project - expect their raw-data in your inbox!

Reply
springtime February 4, 2013 at 12:59 pm

Zack,

The gujarati_a (Hapmap) sample you are using in the Project breaks your own Project rules because that cluster is a bunch of close relatives.

You stipulate this clause in your Project rules:

''Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.''

The fact that you continue to use the Hapmap Gujaratis without:

1. Being unable to source and identify what caste those Gujaratis are, and;

2. Use the samples knowing full well that gujarati_a are closely related and form an outgroup to everyone else -

- is extremely unscientific of you, and in fact discredits the legitimacy of the Harappa Project. Without properly sourcing these samples you lose intellectual credibility, and thus this Project comes across as a charade that is not more than akin to someone's pastime hobby.

Either you should take off the Hapmap gujarati_a because it has already been substantiated by Razib Khan himself that this cluster are close relatives, or you must add a disclaimer acknowledging that this group is not wholly representative of the Gujaratis, but do redeem yourself.

I await a response from you.

Reply
- Zack February 4, 2013 at 1:33 pm
  
  Please provide me with the link where Razib has declared the HapMap Gujarati-a to be close relatives. They are definitely related, but are not close relatives, as is obvious to anyone who has analyzed them.
  
  this Project comes across as a charade that is not more than akin to someone's pastime hobby.
  
  I should point out that this project is definitely a hobby for me.
  
  Reply
  - Parasar February 7, 2013 at 3:02 pm
    
    I think Razib found a couple of them to be related.
    "the definitely related individuals seem to be in the Gujarati_A cluster!"
    
    To which you and John Hawks responded:
    
    "I removed NA20900 and NA20909 from my datasets due to they being too similar to other samples. I used simple IBS similarity. Their IBD numbers are really weird though."
    
    "I started working with the Gujarati sample a couple of years ago and went to talk to a South Asianist about it. His immediate reaction when I described the sample was that it would give all kinds of weird results because the Houston Gujarati community is really uncharacteristic in many ways, mostly Brahmins from the same areas and therefore full of distant relatives."
    http://blogs.discovermagazine.com/gnxp/2011/03/looking-for-relatedness-in-the-hapmap-gujaratis/#.URQFjR1Ii2w
    
    Though Razib's Gujarati_A is your gujaratis-b and that may be causing a confusion here.
    
    Reply
    - springtime February 9, 2013 at 7:57 pm
      
      @Parasar
      
      A colleague of mine recently got in touch with John Hawkes and asked him to substantiate that comment he made two years ago, but he said he could not and that it was based on his friend's conjecture.
      
      Reply
springtime February 4, 2013 at 2:51 pm

Zack,

Do you have a legitimate source for the individual admixtures of the Gujarati_a Hapmap? I can't seem to access this on the spreadsheet to be able to compare them with the Harappa equivalent participants.

You are surely correct that the Gujarati_a are not purportedly close relatives, however they are related individuals (from some hitherto unknown caste that forum scientists can only postulate and make grand inferences about... which is highly unprofessional in my view. One cannot really know for certainty that the Gujarati_a from Houston are Patels, and thus there is the real danger of presenting grossly misleading factual data from the near assumption that two Patels who have participated on the Project supposedly have similar admixture. In other words:

1. You can't be 100% sure that the Gujarati_a are Patels - but given that two Patel participants have similar admixtures, this is good enough proof for you.

2. I hope you know there is no one caste of Patels; they are sub-divided into dozens of highly endogamous clans who have historically favoured hypergamy with the emphasis on socially competitive gols, and also practice a form of caste hierarchy among themselves. In other words, there are upper-caste Patels, and then lower-caste Patels.

The issue I am raising thus is that, I think it is rather unfair of you to use the Gujarati Hapmap samples without making a disclaimer that there is some controversy surrounding the origin and assessment of the samples. There are people who look at these samples on anthropology forums and think they are wholly representative of Gujaratis (which is again unfair, because the Gujarati_ a are a tight cluster of related individuals) and given the level of endogamy they practice, they won't be so genetically close to other Gujarati castes either. Believe it or not but I've had arguments with other South Asians about this because they wrongly assumed that the Gujarati_a are wholly representative of the whole Gujarati populace.

Even if, for one moment, we were to believe that the Gujarati_a are Patels, they would still not be wholly representative of Gujaratis as a whole because back in Gujarat they constitute not more than approx. 20% of the populace!

Reply
- Zack February 6, 2013 at 3:58 pm
  
  There is no controversy regarding the HapMap Gujaratis. It is well known among academic researchers and genome bloggers that about two-thirds of the samples, those I label Gujarati-a, are an extremely endogamous group.
  
  No one has said that these HapMap Gujaratis represent all of Gujarat. Same as no one should think that the HGDP Sindhis represent all of Sindh. While those Sindhis are labeled only as Sindhis, it is clear to anyone who has analyzed their data and knows Sindh that there are a few Balochis settled in Sindh included as well as a few with partial African ancestry.
  
  Reply
springtime February 9, 2013 at 7:53 pm

@Parasar

Razib Khan has this to say about his Gujarati_a (Zack's Gujarati_b):

"Gujarat_A has some individuals with much more â€œWest Eurasianâ€ ancestry"
http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk

Still, that "tight" cluster I'm talking about is Razib's Gujarati_b (or Zack's Gujarati_a) - this is the one which forms an outgroup to everyone else and the autosomal admixtures for these individuals seems consistent:

"Rather, thereâ€™s one â€œtightâ€ cluster, which I will label â€œGujarati_Bâ€ from now on in my data set"
http://blogs.discovermagazine.com/gnxp/2011/02/who-are-those-houston-gujus/#.URbqWB3krSk

Then Razib says this:

"But my guess is that Gujarati_B are a subset of Patels. In other words, theyâ€™re a genetically distinct jati. I suspect that Gujarati_A are a more diverse bunch from a number of different jatis."

Then he changes his tune and goes on to say this later:

"In any case, to my surprise the definitely related individuals seem to be in the Gujarati_A cluster!"

Which is strange because Zack's Gujarati_b individuals differ in their autosomal admixture components quite a bit. In other words, both Gujarati_a and Gujarati_b are two distinct jatis, which means neither group could be wholly representative of Gujaratis. It's the "tight" cluster group which gives off weirdo results...

Reply
Robin April 15, 2013 at 1:46 pm

Hi Jack did my own DNA some time back, But decided to have my parents done this year and they both have come in. I had some oddities showing up in my DNA and this why I ran theirs. When you Iran, Turkey, and so on. You wonder why. Could run theirs for me.
Robin

Reply
nik monroe June 29, 2013 at 3:58 am

need to know why I get following error:

12 ancestral populations
166462 total SNPs
At line 142 of file DIYDodecad.f90 (Unit 50 "genotype.txt")
Traceback: not available, compile with -ftrace=frame or -ftrace=full
Fortran runtime error: End of file

Reply
nik monroe June 29, 2013 at 12:24 pm

need to know if anyone came across following error during standardize step:
Error in is.data.frame(x) : object 'X' not found

Reply
nik monroe June 30, 2013 at 1:56 pm

any replies to my prior posts appreciated.

Reply
S R August 12, 2013 at 2:32 pm

Any thoughts on using the National Geographic Genographic 2.0 project service? They claim to test for a wider range of markers that are optimized for ancestry studies. If I do use their service, would the data be in a usable format for your project? Thanks.

Reply
- Zack August 12, 2013 at 2:51 pm
  
  I do accept Geno2 data and can analyze it. However, my current admixture analysis only has 14,000 SNPs in common with Geno2.
  
  Reply
Asterope September 27, 2013 at 5:02 pm

Hello, my brother-in-law is Vietnamese and has 4 grandparents from Vietnam. Would you be interested in his data? I noticed on one of the spreadsheets Vietnamese was mentioned, but I wasn't sure if there was a need for the data or not. If so, I'll let him know! Thank you.

Reply
- Zack September 28, 2013 at 9:59 am
  
  Vietnamese are not eligible to participate.
  
  Reply
Asterope September 27, 2013 at 9:17 pm

Hello, I also had a question regarding the GEDmatch Harappa test. According to it, I have: S-Indian 0.63% --- Baloch 8.50% -- Caucasian 8.85% --- SW-Asian 1.61% (with European the most predominant: NE-Euro 49.52% ----Mediterranean 29.77%).

On the Dodecad K12b test, I also get 7.01% for Gedrosia, which I believe means Balochistan, which is the same or similar to Baloch? Does this indicate that I have South Asian ancestry and if so, could I be help to the project?

I don't know if it's helpful, but my mtDNA is U4c1.

My brother who was also tested with 23&me, in the Harappa test gets slightly higher numbers: S-Indian 0.90% ----Baloch 9.34% ---Caucasian 9.34% ---SW-Asian 2.21%---Siberian 1.43%

I'm not entirely sure what this indicates, but I'd love to find out. Does anyone have any ideas? Or does it not amount to much of anything? (Or at least, what does 8-10% mean in regard to generations away from me?)

Reply
- Zack September 28, 2013 at 10:03 am
  
  The names of these admixture components are just mnemonics based on which group a component is highest in. Also these components are not some pure ancestry. Having 5% Baloch for example doesn't mean you have Baloch ancestry.
  
  The best way to look at your results is to compare them with the various group averages.
  
  Reply
  - Asterope September 28, 2013 at 2:01 pm
    
    I see. Thank you for the quick reply. I'm new to this stuff and looking at all the admixture models often confuses me. My family is trying to discover the mystery of my father's ancestry as we were told he was Native American (and skin-color wise he looks it), but for my brother and I, no NA turned up. So, we were suspecting other origins, even the Roma (and looking at the Baloch people it could have fit as well), but it's hard trying to see anything in the GEDmatch admixture profiles that makes any sense, at least under mine and my brother's profile. He's going to be tested soon.
    
    Reply
Welcome | South Asian Ancestry - pingback on February 1, 2014 at 3:40 pm
AncestryDNA | Harappa Ancestry Project - pingback on February 13, 2014 at 1:09 pm
Fas March 6, 2014 at 6:06 pm

If I uploaded by data to GedMatch would you have access to it or would I have to submit it via email? Would the data I submit affect my results on Gedmatch or no?

Thank you

Reply
- Zack April 18, 2014 at 11:18 am
  
  I don't have access to data uploaded to GEDMatch.
  
  Reply
ali May 25, 2014 at 2:46 pm

hello
i sent you my data in 2013 and you sent me back id but still now u didnt send me password my id is hrp0355.

Reply
- Zack June 11, 2014 at 5:50 pm
  
  You don't need a password.
  
  Reply
jamwal dogra rajput June 7, 2014 at 2:48 am

I was just curious- did you ever have any participants from Jammu or the Northern Himachal (Chamba, Kangra) region?

Reply
- Zack June 11, 2014 at 5:50 pm
  
  No, I don't.
  
  Reply
Paul Gill July 1, 2014 at 7:35 pm

I am kit hrp299 and am a punjabi Jatt and on the the participants ethnicity sheet the kit hrp299 is shown as some bengali which is a gross error. I think that there is no need of separate sheets for haplogroup and ethnicity. Column B on the haplogroup sheet can be used for ethnicity to avoid such errors. Can you please correctly match the ethnicity to each kit and add it on column B of haplogroup sheet?

Reply
- Zack July 2, 2014 at 8:10 pm
  
  Because anyone can edit the haplogroup spreadsheet, it's been a problem keeping it correct. Therefore, I am no longer using it. The ethnicity spreadsheet linked on the right sidebar of this blog shows your ethnicity correctly.
  
  Reply
brad April 12, 2015 at 10:35 am

hello I was wondering If I meet what ur looking for GEDmatch number a529919

Reply
- Zack July 5, 2015 at 10:24 am
  
  Not sure what you are asking.
  
  Reply
Aaron May 21, 2015 at 9:29 am

Hello,

All 4 of my grandparents were born in Iran. Are you still accepting data from 23andme?

Also, if I send in data, would I be able to see my own results afterward (your project seems to be more specific than 23andme, so I am hoping I can learn more by sending data in).

Reply
- Zack July 5, 2015 at 10:25 am
  
  I have been too busy IRL to do much recently. But hopefully I will be more active this summer. I am accepting data.
  
  Reply
  - Aaron September 17, 2015 at 2:17 pm
    
    How does one send data?
    
    Reply
    - Aaron September 24, 2015 at 12:22 pm
      
      Actually, I just emailed you the data. Thank you very much.
      
      Reply
Salim May 26, 2015 at 11:20 pm

Zack, for a participant that has one grandparent from Afghanistan and the rest whom are Arabic, do you consider the participant to fall under the Afghani category or Arab (Maghrebi) or some other category (mixed, born and living in America)?

Reply
- Zack July 5, 2015 at 10:26 am
  
  Arab from the Maghreb?
  
  That would be mixed, I won't usually put mixed ancestry into a single group.
  
  Reply
Reza September 23, 2015 at 4:23 pm

Hey Zack,

Have data for myself, Bengali Muslim and an interested friend too, Punjabi Muslim. Are you still accepting data? If so, can zip across the files..

Reply
aaron December 10, 2015 at 9:40 am

i was able to see my results by using the harappaworld calculator through gedmatch. thank you for creating it.

how did you come up with your reference populations? are they based on human settlement 8-10000 years ago? are they somewhat arbitrary?

Reply
Mary May 19, 2016 at 6:02 pm

I used the Harappa world calculator thru GED match and I am beyond surprised at some result .From everything I was told everyone came out of Ireland .I know inky 23&me they showed Mediterranean , European,south Asia,south pacific ,Native American,South African .I have noticed on Ancestry.com . I was matching to people of South Pacific background with no Irish in them as very distant cousins .

S-Indian -
Baloch 10.05
Caucasian 4.29
NE-Euro 50.17
SE-Asian -
Siberian -
NE-Asian -
Papuan 0.17
American 0.16
Beringian 0.71
Mediterranean 34.39
SW-Asian -
San 0.06
E-African -
Pygmy -
W-African -

Reply
- Mary May 19, 2016 at 6:06 pm
  
  Correction: 23 & me showed North africa/middle east , western European,northern european & south Asia
  
  Reply
Kevin July 12, 2016 at 5:31 pm

Have you ever found an African-American with no idea where his South Asian grandparent came from?

Reply
Amaan July 17, 2016 at 10:56 pm

So a Kerala-Malayali is 42% S. Indian, 39% Baloch, 7% caucasian 3% SW Asian

Reply
Ri July 19, 2016 at 3:31 pm

Still accepting Nepali's?

Think you should also categorize Brahmin and Chettris seperately if you have not already since Chettris have a higher Mongolian component. Same goes with Maithali Brahmins and Hill Brahmins, two seperate regions and people.

Reply
Samuel Bakhshi December 13, 2016 at 7:47 pm

Hi. Name is Sam. Just bought a kit from 23andme. My paternal Grandparents are from Tarn Taran who settled in Renala Khurd (85 km from HARAPPA). My mom is from Nankana Saab. Interested to provide you my RAW data post Christmas

Reply
Arnold September 2, 2017 at 10:26 pm

Hey man! I emailed you my dna and a brief history of my known ancestry as well! Would be sweet if you can reply man, im very interested in taking part of this project!

Reply
Zain Saleem October 9, 2017 at 4:01 pm

Hi Zack
I'm curious as to the accuracy of the admixture analysis that GEDmatch/HarappaWorld provides. It tells me I'm Kashmiri in both single and two population approximations (this is accurate enough as my family have always said we're Kashmiri. We're from Azad/Pak Kashmir as well). What is the source of the Kashmiri DNA data labelled "harappa" and "reich" in the results? Is it from studies involving Kashmir Valley individuals?

S-Indian 32.22
Baloch 37.41
Caucasian 14.54
NE-Euro 10.39
SE-Asian 0.34
Siberian 1.51
NE-Asian 1.61
These are the results of my 23andme upload (missed the negligibles). I'm confused about Baloch and Caucasian percentages. Are they based on ancient baloch/caucasian DNA? What specific details can I get from those segments if any?

Thanks

Reply
- Shaz June 14, 2018 at 4:45 pm
  
  My results are similar to yours. Was always told we had Kashmiri in us.
  
  1 Baloch 35.64
  2 S-Indian 34.28
  3 Caucasian 15.83
  4 NE-Euro 7.72
  5 Siberian 2.38
  6 NE-Asian 1.6
  7 SW-Asian 0.69
  8 American 0.65
  9 Papuan 0.58
  10 San 0.37
  11 W-African 0.19
  12 Pygmy 0.08
  
  Reply
Ryan Khan December 26, 2017 at 10:29 pm

Hello. I have emailed you my data and ancestral information. Hope you can help me out. Thanks!

Reply
Jessin March 17, 2018 at 9:27 pm

Thanks for your hard work. I am a Marthoma Christian from Pathanamthitta district,Kerala and my results are 50% S. Indian, 37% Baloch, 5% Caucasian, etc. Is that pretty normal when compared to the rest of the Kerala People/ other South Indian states? Thanks

Reply
Owais August 15, 2018 at 6:39 pm

A Brahui person from both paternal and maternal side considering getting a 23andme DNA test if it would be helpful.

Reply
A Khan September 21, 2018 at 5:04 am

I belong to Bengali Namasudra community. I have emailed you my 23andme raw data. Hope you can help me out and thanks a ton for your efforts.

Reply
Muhammad aqib niazai March 14, 2019 at 6:10 am

Do you have any results from niazi pashtuns,khattaks and bansgash pashtuns and can i send you results from any lab in pakistan. Because 23 and me is not providing services in pakistan.

Reply
Nawal Hussain August 6, 2020 at 5:48 pm

Hey Zack,

I sent you the raw DNA file I got from 23andMe and would love to hear your thoughts!

Reply

Leave a Reply Cancel reply

Trackbacks and Pingbacks:

23andme v3 Data | Harappa Ancestry Project - Pingback on 2011/01/25/ 08:01
Gene Expression » Notes on the future - Pingback on 2011/01/26/ 23:56
Harappa Ancestry Project @ N ~ 50 | Biology News by Biologged - Pingback on 2011/03/13/ 02:32
Accepting FTDNA Family Finder | Harappa Ancestry Project - Pingback on 2011/05/03/ 00:01
{ Brown Pundits } » 100+ participants in the Harappa Ancestry Project - Pingback on 2011/05/03/ 16:41
Welcome | South Asian Ancestry - Pingback on 2014/02/01/ 15:40
AncestryDNA | Harappa Ancestry Project - Pingback on 2014/02/13/ 13:09