Project Anniversary

It has been one year for the Harappa Ancestry Project. I announced it on January 17, 2011 and then moved it to its own domain on January 19.

It started out fast and furious with participants sending their data every day and I was blogging it multiple times a day. Now it has slowed down quite a bit with only one South Asian unrelated participant (not counting any Romany) in the last 2 months.

Speaking of participants, there have been a couple of complaints.

The decision to include non-South Asian participants from countries that do not neighbour the Subcontinent contradicts the Harappa Project's original inclusion criterion.

I concur with DMXX. While I'm certainly not anyone to tell Zack how to run his own show, I think the project is losing it's original focus. Accepting folks from West-Asia was also fine, given that South-Asians derive a lot of ancient ancestry from the area and thus may be deemed a secondary focus area. The same could also be said for those with partial Roma Gypsy ancestry. But, some of the runs seem to be almost entirely dominated by non-South Asian participants, who have absolutely no connection with the subcontinent. I can't help but ask on what basis these Brazilian, Belizean, Mexican, Hispanic, Somali, African-American and European participants were accepted into the project.

My approach has been to make it clear to any potential participants that my focus is on South Asia. Thus any Admixture components I have computed have been with a heavily South Asian dataset. Also, a number of PCA and clustering analyses that I do are at times limited to South Asians etc. On the other hand, I have accepted any participant who has asked to be included. I run a basic Admixture run for everyone which is a fairly automated process and sometimes include them in other analyses.

Let me illustrate with an example. I am working on a new Admixture calculator. For computing the components, I am using all the South Asian project participants in addition to reference datasets. I am going to select the data for that in such a way that we get Admixture components which gives us a better idea of South Asian genetic ancestry.

While participation in the project has slowed down, the research on South Asian genetics has picked up. We had the Metspalu et al paper and dataset. Also. 1000genomes is expected to release 400 South Asian samples (100 Lahori Punjabis, 100 Bangladeshis, 100 Sri Lankan Tamil and 100 Indian Telegu) over the summer.


  1. Thanks Zack. I'm one of the small% So. Asian folks with a tradition of Roma ancestry---I was delighted to be included and I feel that upon seeing my results it was a valid inclusion.

    I'm fine with being excluded on various runs. However I want folks to know---even those of us with smaller amounts appreciate being included. #194

  2. I congratulate on the anniversary.

  3. Paul Givargidze

    Congrats on the anniversary, Zack. 🙂 And, a big thanks for all your hard work.

  4. Thank you Zack for all the work you put into this!

  5. HI Zack,

    This is interesting, do you know whether the results from 1000genomes will be made available to FTDNA. The crazy thing is that I am the only one on FTDNA that doesn't have a match on my Y-DNA side. Since my grandfather was born in Lahore, it would be interesting to see the results of the Lahoris.


  6. congratulations!

  7. Congratulations! Here's to another year of interesting discoveries in the world of South-Asian genetics and genealogy!

  8. Congrats on the 1 year anniversary Zack.
    I very much appreciate all the work you do for us and appreciate you allowing Romany participants in your study. 🙂