Xing et al Data

The data for Xing et al's paper "Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping" is available online.

This dataset consists of 850 individuals, but 259 of them overlap with the HapMap. Another 15 samples had to be removed because they were too similar to others. I also removed Native American samples. This leaves us with 529 samples.

Ethnic group Count
Slovenian 25
Punjabi Arain 25
N. European 25
Nepalese 25
Kyrgyzstani 25
Iban 25
Buryat 25
Bambaran 25
Andhra Pradesh Brahmin 25
Kurd 24
Dogon 24
Irula 23
Thai 22
Pygmy 22
Urkarah 18
Tamil Nadu Brahmin 14
Hema 14
Tongan 13
Tamil Nadu Dalit 13
Samoan 13
!Kung 13
Japanese 13
Andhra Pradesh Mala 11
Pedi 10
Andhra Pradesh Madiga 10
Alur 10
Nguni 9
Sotho/Tswana 8
Vietnamese 7
Stalskoe 5
Chinese 5
Khmer Cambodian 3

This dataset is valuable because it contains several South Asian, Central Asian, Southeast Asian and Caucasian groups. However, it does not have a good SNP overlap with 23andme and the other datasets. It has only about 29,000 SNPs in common with 23andme v2 data. Combining HapMap, HGDP, SGVP, Behar et al and Xing et al with 23andme data leaves us with 25,000 SNPs. Due to that, I'll be using Xing et al data for only a few analyses.


  1. any chance you can post the PLINK files of the public data sets you've assembled for your analysis?

    • I won't provide the plink format data files since I am not sure if I have the right to distribute these datasets, but I can do the next best thing which is to provide the scripts I am using to convert the downloaded data to plink format.

  2. Reference Dataset II | Harappa Ancestry Project - pingback on February 2, 2011 at 5:59 pm
  3. Xing to PED Conversion | Harappa Ancestry Project - pingback on April 15, 2011 at 5:37 am
  4. Xing Ref3 K=11 Admixture | Harappa Ancestry Project - pingback on February 27, 2012 at 12:18 pm

Trackbacks and Pingbacks: