Another Update

I have a total of 51 participants in the project right now who have sent me their raw data. This is not counting three people who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.

The following groups are represented:

  • Punjab: 7
  • Iran: 7
  • Tamil: 6
  • Bengal: 5
  • Andhra Pradesh: 2
  • Bihar: 2
  • Karnataka: 2
  • Caribbean Indian: 2
  • Kashmir: 2
  • Uttar Pradesh: 2
  • Sri Lankan: 2
  • Kerala: 2
  • Iraqi Arab: 2
  • Anglo-Indian: 1
  • Roma: 1
  • Goa: 1
  • Rajasthan: 1
  • Baloch: 1
  • Unknown: 1
  • Egyptian/Iraqi Jew: 1
  • Maharashtra: 1

I haven't received data from any new participants for more than a week which is the longest lull since I started Harappa Ancestry Project. So go out there and get people to send me their 23andme raw data.

Also, does anyone know if there are a significant number of South Asians who have done FamilyTreeDNA's Family Finder test? Is there a good overlap of SNPs between their test and 23andme's?

We have enough Punjabis, Iranians, Tamil and Bengalis that they deserve separate analysis posts.


  1. I have a few relatives on 23andMe in the 0.48%-1.32% DNA shared (3-12 segments) range who I could try to press into submitting data. I know your criterion was no second cousins or closer, and none of them appear to be (we still haven't figured out exactly how we're related, but I know all my second cousins and they're not among them) but they might still be too similar.

    By the way, neither of the samples from Karnataka are really Kannadigas, at least in the linguistic sense. HRP0017's Tamil speaking, and HRP0025's Konkani, presumably like HRP0026, the Goan Catholic Brahmin. I'm not sure it matters -- most of the South Indian Brahmin participants don't seem to be particularly heterogeneous in any case -- but I thought it might be worth noting since the Punjabis and Bengalis aren't divided up according to political subdivision (Pakistani Punjab vs. Indian Punjab, Bangladesh vs. West Bengal, etc.).

    • 1% shared DNA would likely be dissimilar enough to be useful.

      Regarding the Karnataka classification, thanks for the detail. Unfortunately my knowledge of south India is based on Indian Americans, which leaves big gaps. So any corrections/elaborations are welcome.

  2. Harappa Ancestry Project @ N ~ 50 | Biology News by Biologged - pingback on March 13, 2011 at 2:32 am
  3. I've been trying my very best by spreading word about your project by-

    -Starting threads about the same on DNA-Forums and two other anthropology/genetics forums.

    -Contacting prospective project members on 23andMe via private messages, plenty are unaware of your project. So, via the account of I manage (HRP36), I've been informing people of SA descent to join the project. Many of the more recent participants joint the project upon my persuasion actually, HRP44 and HRP48 are a two members I clearly remember having a correspondence with in regard to this project. Many of the people I messaged don't seem to check their 23andMe accounts on a regular basis, so I haven't gotten any replies from all of whom I contacted. Perhaps they disabled the e-mail notification option.

    -You should have my own data set over to you in a week or two, hopefully I should have my own results by then.

    Looking forward to the ethno-linguistic group specific analyses you plan to do in the future, Zack.

  4. As far as I've seen, there are plenty of Indians who have done the Y-DNA and mtDNA tests over at FT-DNA. Perhaps folks such as Parasar or Simranjit should start a discussion on the FT-DNA forums informing the SA descent folks there of the project. Are you accepting FT-DNA data now, Zack?

  5. Just FYI Zack , I've done the family finder test.

    • Would you mind sending it to me so I can see how much overlap there is with the rest of the data and how easy/hard it is to convert to ped format? Thanks.

  6. I've messaged a couple of 3rd/4th cousins.. no response thus far.. I suppose one could get an idea of recent ancestry within a homogeneous group with sufficient numbers?

  7. Privacy seems to be an issue for some. Here's what one such person I contacted had to say on the same -

    Here's what he had to say when I told him/her that they could simply drop in their raw data with a word or two about their ancestry via an anonymous ID, and when I directed them towards your statement regarding privacy :-

  8. Ah, messed up the HTML tags.

    Anyway, as I was saying, privacy seems to be an issue for many. Here's what one such individual had to say -

    (QUOTE) I very much aware of the project as I follow Razib Khan on twitter. My only concern has been the lack of confidentiality. If there is a mechanism to drop the files anonymously - with the necessary supplemental information of course, then I think more people will gladly participate. Shouldnt be too difficult to establish a secure FTP site tied to a questionnaire for the supplemental info.

    Can you help enable that [put that forward]? (UNQUOTE)

    When I directed him towards your privacy statement, and advised that they could send you their raw data file via an anonymous e-mail ID, rather than their official one, simply mentioning a word or two on their ethnic background if privacy was the issue, here is what they had to say to that :-

    If he sets up an anonymous way to drop the file and other info (caste/ethnic background/geo etc) and get the ID code in the same tool, it is done. I am sure it's not very difficult. If that's there, I'll send my data and spread the word as well.

    I think many people are in the same boat as they are not sure what's possible with one's raw data. We live in times where we dont even give out phone numbers and hide it with Google Voice numbers! I am sure you are aware of the extreme privacy concerns, esp in the west.


    • People get the misconception that their genome file is the entire genome without realizing is is a mere fraction( and a tiny one at that) of their entire genome. Sad really.

    • While the easiest thing for now is to create a throw-away email address to send me the data 🙂 I'll look into what I can do for anonymous submissions. I can't use a PHP-based upload since that's for small file sizes only.

  9. Iranians | Harappa Ancestry Project - pingback on March 24, 2011 at 10:21 am

Trackbacks and Pingbacks: