One PED File to Rule Them All

I am interested in North African populations due to my own heritage, so when Razib alerted me that Henn et al had a paper out about South African origins of humans and their African dataset was publicly available and included populations from all over Africa, I immediately downloaded it.

I have also been considering looking into the East Asian admixture in South Asians and Iranians in some detail to see where it originates from: Southeast Asia, Chinese/Japanese/Koreans, or the Turkic/Mongolian/Siberian populations of interior northeastern Asia. At a quick glance, Razib is correct:

The eastern Asian components are enriched among Bengalis, as you’d expect, but they’re found in different proportions among many individuals who hail from the northern fringe of South Asia more generally. It seems clear that the further west you go, the more likely the “eastern” element is going to be Turk, while the further east (and to some extent south) the more likely it is to be more southernly in provenance.

To do a better job though, it would be better to have more than the Yakut as an examplar of the Siberian component as I have done till now. Therefore, I downloaded the arctic populations dataset from Rasmussen et al.

Combining Henn et al and Rasmussen et al with my previous datasets (HapMap, HGDP, SGVP, Behar et al and Xing et al), I got 3,970 samples with a total of 1,716,031 SNPs represented, though at 99% genotyping rate it gets reduced to about 27,000 SNPs.

I did not remove any populations or individuals except for any duplicates and non-founders.

Here's the information on the populations represented in this dataset.

Now I am on the lookout for more datasets that are public, have enough SNPs in common with this set and can easily be converted into the Plink PED format. So if you know of any, let me know. May be I will have the biggest and most diverse dataset with your help.


  1. Hey Zack, if you can figure out how to get this stuff into a ped file without too much fuss (scroll down to the bottom of the page), please let me know how. I'm getting a headache just from thinking about it.

  2. There is no doubt that the eastern Asian component in the north west is Turko-Mongol (as is best seen in the Hazara). Some of the eastern Asian in eastern Indians will be Turki too. There was a significant Turki incursion in the Tirhut to Assam region. The Turks from Kunduz had crossed over to the Kuch Bihar region and devastated Tirhut and Bengal. The Turki lineage of Piran Visah was reported in Assam.

  3. It would be great if you can do a test to see how much turkic ancestry we have in us. I personally have an interest in turks and turkic history as I've been learning about how much they've managed to accomplish throughout history and also how their culture has influenced ours. So, undertaking that test would be nice.

  4. HapMap Redo | Harappa Ancestry Project - pingback on April 2, 2011 at 8:35 am
  5. Behar Redo | Harappa Ancestry Project - pingback on April 2, 2011 at 7:22 pm
  6. Xing Redo | Harappa Ancestry Project - pingback on April 3, 2011 at 6:02 pm
  7. Henn Duplicates | Harappa Ancestry Project - pingback on April 6, 2011 at 5:24 pm
  8. Rasmussen Likely Relatives | Harappa Ancestry Project - pingback on April 7, 2011 at 5:18 pm
  9. Austroasiatic Dataset Duplicates | Harappa Ancestry Project - pingback on April 8, 2011 at 4:39 pm
  10. Reich et al Duplicates | Harappa Ancestry Project - pingback on April 12, 2011 at 2:44 pm

Trackbacks and Pingbacks: