Reference Dataset II

Combining my reference population with Xing et al data gets me 3,222 3,161 samples but with only about 23,000 SNPs after LD-pruning.

The good thing is that this dataset has 544 South Asian samples from 24 ethnic groups. So it'll be useful for some analyses despite the low number of SNPs. I'll try to run parallel analyses on my reference population and this dataset so we can compare the pros and cons of both.

UPDATE: I removed 61 pygmy and San samples.


  1. Reference II Admixture Analysis K=2-5 | Harappa Ancestry Project - pingback on February 11, 2011 at 8:27 am
  2. Reference II PCA | Harappa Ancestry Project - pingback on March 26, 2011 at 10:36 am
  3. Ref 2 South Asians + Harappa PCA | Harappa Ancestry Project - pingback on March 30, 2011 at 1:07 am
  4. Harappa Gene Similarity | Harappa Ancestry Project - pingback on April 8, 2011 at 7:32 am
  5. Admixture: Choice of K | Harappa Ancestry Project - pingback on April 13, 2011 at 12:19 pm
  6. Zack, I have one request - could you release a calculator based on the DIY Dodecad files like David from Eurogenes has done here ( for your initial HAP Ref I/II K=12 analysis?

    • I'll try to release files for DIYDodecad soon.

      • Awesome, I am eagerly looking forward to it. Thanks for the effort, Zack.

        As an aside, I've been told by one or two people that, while they submitted their raw-data file to you [harappa at zackvision dot com] a while ago, they still haven't been assigned an alphanumeric ID (and consequently, not included in a K=11 admixture run). Perhaps these individuals' mails to you are getting caught up in your spam folder? It'd be cool if you could check, Zack.

        • How long ago? If it was before the weekend, then it likely ended up in spam. It would be nice if you or they emailed me some info (email address/name) so I could dig them out of the mountains of spam.

Trackbacks and Pingbacks: