Reference II Admixture Analysis K=6-9

Posted by Zack on February 12, 2011

Continuing with admixture analysis of Reference II dataset, here's the spreadsheet.

Other than the differences with Reference I analysis, do take a look at the additional ethnic groups included in this dataset, especially the 8 South Asian groups: Tamil Nadu Dalit, Irula, Andhra Pradesh Madiga, Andhra Pradesh Mala, Tamil Nadu Brahmin, Andhra Pradesh Brahmin, Punjabi Arain, Nepali.

Let's start with K=6.

Reference II Admixture K=6

Note the difference between Tamil Nadu Dalits and Brahmins. The Dalits lack the European ancestral component of the Brahmins.

For K=7, the East Asian component splits into Northeast Asian and Southeast Asian.

Reference II Admixture K=7

Punjabi Arain are about the same as Sindhis (excluding the those with some African ancestry) in terms of their ancestral components.

Comparing the Andhra Brahmins to the Mala and Madiga, we see the same pattern as in Tamil Nadu: Brahmins have more European and Southwest/West Asian while Mala and Madiga have more Southeast Asian and South Asian.

At K=8, the African component splits into West African and East African.

Reference II Admixture K=8

The Nepalese samples are interesting. They have about 49% South Asian, 19% Northeast Asian, 16% European and 10% Southeast Asian. So they look like a mix of South Asian and East Asian.

Here's the average absolute difference between the two datasets for each ancestral component:

Ancestral Component	Mean(Abs(Ref1-Ref2))
South Asian (C1)	2.17%
Southwest Asian (C2)	1.32%
European (C3)	1.70%
Southeast Asian (C4)	2.16%
Papuan (C5)	0.33%
Northeast Asian (C6)	1.93%
West African (C7)	0.27%
East African (C8)	0.48%

The larger differences are for Balochi, Cambodian, Dai, Han, Kalash, Lahu, Miao, Naxi, She, Singapore Chinese, Tu, Tujia, US Chinese, and Yi, Thus, it's mostly East Asian groups.

For K=9, we see some divergence between the ancestral components inferred from Reference II as compared to Reference I. Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples.

Reference II Admixture K=9

Here's a summary of the ancestral components inferred from Reference II dataset:

K=2	K=3	K=4	K=5	K=6	K=7	K=8	K=9
Eurasian	European	S Asian	S Asian	S Asian	S Asian	S Asian	S Asian
African	E Asian	European	European	European	European	SW Asian	European
	African	E Asian	E Asian	E Asian	SE Asian	European	SW Asian
		African	SW Asian	SW Asian	SW Asian	SE Asian	SE Asian
			African	Papuan	Papuan	Papuan	Papuan
				African	NE Asian	NE Asian	NE Asian
					African	W African	Polynesian
						E African	W African
							E African

I might do some admixture runs for Reference II with Harappa participants later.

Admixturereference

← Reference II Admixture Analysis K=2-5

South Asian PCA →

11 Comments.

sv February 12, 2011 at 1:40 pm

"Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples."

Interesting. I understand that it might take seom time, but if you run it without those latter populations, might we get a component more like the Kalash one?
- Zack February 12, 2011 at 2:22 pm
  
  It is possible we might get the Kalash component in reference II at a higher K. Let's see.
  - sv February 12, 2011 at 5:20 pm
    
    Yes, that seems reasonable. As always, I am looking forward to your future posts!
Mithra February 12, 2011 at 8:47 pm

Almost all the Chinese are now around 50% SE Asian, didn't see this before is it right.
- Zack February 12, 2011 at 10:38 pm
  
  The Northeast Asian component is modal among the Japanese. Thus the Chinese probably should be somewhat mixed with what I have termed Southeast Asian. However, I am going to look at the individual samples of the Chinese to see if there's variation between individuals.
  - razib February 12, 2011 at 11:34 pm
    
    some of the xing results are weird for brownz IMO. not sure i trust it totally.
RK February 12, 2011 at 9:31 pm

I've been playing around a little with the Xing dataset. Here's a PCA, minus the African populations. (Maybe I should've removed the Amerindian ones as well.) The lousy labeling is due to my not really knowing how to use gnuplot.

Here are the South Asian populations. Note that since I included myself in this second run, there were only 40,808 SNPs after pruning -- though it doesn't look like anyone's shifted that much as a result. (As Zack noted earlier, the Xing dataset doesn't have that many SNPs in common with 23andMe's chip.) With that caveat in mind, it looks like some AP Brahmins are shifted towards the tribal/Dalit cluster. I'm the red cross with a blue box, by the way.

I tried merging in the HapMap Gujaratis, but something went wrong and they ended up clustering far away from everyone else. (And defining their own component to boot.) Maybe I forgot to extract the common SNPs -- in which case they're going in tomorrow!
- Zack February 12, 2011 at 10:38 pm
  
  Great! I just did some PCA plots too. Expect something to be up on the blog by morning.
- sv February 13, 2011 at 2:16 am
  
  Nice work!
Chinese Samples | Harappa Ancestry Project - pingback on February 16, 2011 at 7:21 am
Reference II Admixture Analysis K=10 | Harappa Ancestry Project - pingback on March 2, 2011 at 1:48 pm

Trackbacks and Pingbacks:

Chinese Samples | Harappa Ancestry Project - Pingback on 2011/02/16/ 07:21
Reference II Admixture Analysis K=10 | Harappa Ancestry Project - Pingback on 2011/03/02/ 13:48

Harappa Ancestry Project

Genetics and South Asia