Reich et al and Pan-Asian Datasets

Posted by Zack on March 17, 2011

I got access to the Reich et al (Nature 2009) dataset used in their paper "Reconstructing Indian population history".

It has the following populations:

Aonaga	Aus	Bhil
Chenchu	Great_Andamanese	Hallaki
Kamsali	Kashmiri_Pandit	Kharia
Kurumba	Lodi	Madiga
Mala	Meghawal	Naidu
Nysha	Onge	Sahariya
Santhal	Satnami	Siddi
Somali	Srivastava	Tharu
Vaish	Velama	Vysya

There are 141 individuals with 587,753 SNPs in their dataset which conveniently is in PED format.

Also, Blaise pointed me to the Pan-Asian SNP data used in the Dec 2009 Science paper "Mapping Human Genetic Diversity in Asia".

It includes the following 71 populations:

Maya	Auca	Quechua	Karitiana	Pima
Ami	Atayal	Melanesians	Zhuang	Han_Cantonese
Hmong	Jiamao	Jinuo	Han_Shanghai	Uyghur
Wa	Alorese	Dayak	Javanese	Batak_Karo
Lamaholot	Lembata	Malay	Mentawai	Manggarai
Kambera	Sunda	Batak_Toba	Toraja	Andhra_Pradesh
Karnataka	Bengali-Assamese	Rajasthan	Uttaranchal	Uttar Pradesh
Haryana	Spiti	Bhili	Marathi	Japanese
Ryukyuan	Korean	Bidayuh	Jehai	Kelantan
Kensiu	Temuan	Ayta	Agta	Ati
Iraya	Minanubu	Mamanwa	Filipino	Singapore_Chinese
Singapore_Indian	Singapore_Malay	Hmong (Miao)	Karen	Lawa
Mlabri	Mon	Paluang	Plang	Tai_Khuen
Tai_Lue	H'tin	Tai_Yuan	Tai_Yong	Yao
Hakka	Minnan

It has 1,719 individuals with 54,794 SNPs. I wish it had more SNPs considering the wealth of populations.

Also, the Pan-Asian data is in the form of minor allele counts, so I need to convert that back to A/C/G/T. Since there are some HapMap populations included in the dataset, that shouldn't be too hard.

I am going to include both these datasets into my big reference set.

Datasetchina, india, indonesia, japan, korea, malaysia, pan-asian, pasnp, philippines, reference, reich, singapore, taiwan, thailand

← Harappa Participants on 3-D PCA

Distance Measures →

16 Comments.

sv March 17, 2011 at 9:34 pm

Great finds!
Vasishta March 17, 2011 at 11:13 pm

I second SV! I cannot believe you got your hands on THE data-set! Very, very awesome!
- Vasishta March 17, 2011 at 11:43 pm
  
  Wait, this means that an ANI-ASI division is now possible, right?
razib March 18, 2011 at 1:21 am

Wait, this means that an ANI-ASI division is now possible, right?

the ANI-ASI division was created using a different methodology than what ADMIXTURE does. the whole point of reich et al. was that the south asian cluster which always pops out of ADMIXTURE/STRUCTURE/frappe doesn't naturally disaggregate, so they had to use a different technique.
Paul Ã“ Duá¸ƒá¹«aiÄ¡ March 18, 2011 at 3:57 am

Great find on the pan-asian data. If you want I can send you my OH's 23andme (v3) data. She's Filipino though she has some Spanish admixture (1/8th -- great grandfather)

-Paul
- Zack March 18, 2011 at 11:32 am
  
  It will be interesting to look at her data. May be later when I expand the scope of this project. 😀
alex greber March 18, 2011 at 4:48 pm

Good job Zack, will there be more cool admixture analysies now, with the pan asia data set included?

By the way will you make a K=17 and K=18 analysies to?
- Zack March 18, 2011 at 6:13 pm
  
  Yes I'll have more admixture results after I have incorporated the new data.
  
  For the current reference 1, K=17 is probably the highest I'll go.
Dienekes on ANI/ASI | Harappa Ancestry Project - pingback on March 22, 2011 at 8:27 am
Vasishta April 10, 2011 at 2:10 am

Zack, will you be able to run these two data-sets against K=16 like you did for the Xing data set a while ago, sometime in the future? It'd be great if you could do so as the second data set has Uttar Pradesh, Rajasthan and Maharashtra - some of the most populous states in India.

Thanks.
-V
- Zack April 12, 2011 at 7:26 am
  
  I have incorporated Reich in my new reference which I am testing right now.
  
  As for Pan-Asian dataset, I am going to run some Admixture and PCA experiments on it soon. Unfortunately it has too few SNPs in common with 23andme to be of much use for Harappa analysis.
Pan-Asian Dataset Duplicates and Relatives | Harappa Ancestry Project - pingback on April 11, 2011 at 7:11 am
Pan-Asian to PED Conversion | Harappa Ancestry Project - pingback on April 16, 2011 at 8:05 am
Pan-Asian Ref3 K=11 Admixture | Harappa Ancestry Project - pingback on March 13, 2012 at 6:27 am
manglalayag August 10, 2013 at 1:10 pm

Zack, I'm sooo bad at finding stuff online. Can you post the link to the genotype (ped) you found?

I'm mainly interested in the Andamese and Austroasiatic populations. I'm Filipino and would like to see my admixture with those groups.

Many thanks and keep on posting!!!
- Zack August 10, 2013 at 1:20 pm
  
  Neither of these are available publicly online.

Trackbacks and Pingbacks:

Dienekes on ANI/ASI | Harappa Ancestry Project - Pingback on 2011/03/22/ 08:27
Pan-Asian Dataset Duplicates and Relatives | Harappa Ancestry Project - Pingback on 2011/04/11/ 07:11
Pan-Asian to PED Conversion | Harappa Ancestry Project - Pingback on 2011/04/16/ 08:05
Pan-Asian Ref3 K=11 Admixture | Harappa Ancestry Project - Pingback on 2012/03/13/ 06:27

Harappa Ancestry Project

Genetics and South Asia