Reference II Admixture Analysis K=6-9

Continuing with admixture analysis of Reference II dataset, here's the spreadsheet.

Other than the differences with Reference I analysis, do take a look at the additional ethnic groups included in this dataset, especially the 8 South Asian groups: Tamil Nadu Dalit, Irula, Andhra Pradesh Madiga, Andhra Pradesh Mala, Tamil Nadu Brahmin, Andhra Pradesh Brahmin, Punjabi Arain, Nepali.

Let's start with K=6.

Reference II Admixture K=6

Note the difference between Tamil Nadu Dalits and Brahmins. The Dalits lack the European ancestral component of the Brahmins.

For K=7, the East Asian component splits into Northeast Asian and Southeast Asian.

Reference II Admixture K=7

Punjabi Arain are about the same as Sindhis (excluding the those with some African ancestry) in terms of their ancestral components.

Comparing the Andhra Brahmins to the Mala and Madiga, we see the same pattern as in Tamil Nadu: Brahmins have more European and Southwest/West Asian while Mala and Madiga have more Southeast Asian and South Asian.

At K=8, the African component splits into West African and East African.

Reference II Admixture K=8

The Nepalese samples are interesting. They have about 49% South Asian, 19% Northeast Asian, 16% European and 10% Southeast Asian. So they look like a mix of South Asian and East Asian.

Similar to the previous post, here's a comparison of K=8 admixture analysis between Reference I and Reference II datasets.

Here's the average absolute difference between the two datasets for each ancestral component:

Ancestral Component Mean(Abs(Ref1-Ref2))
South Asian (C1) 2.17%
Southwest Asian (C2) 1.32%
European (C3) 1.70%
Southeast Asian (C4) 2.16%
Papuan (C5) 0.33%
Northeast Asian (C6) 1.93%
West African (C7) 0.27%
East African (C8) 0.48%

The larger differences are for Balochi, Cambodian, Dai, Han, Kalash, Lahu, Miao, Naxi, She, Singapore Chinese, Tu, Tujia, US Chinese, and Yi, Thus, it's mostly East Asian groups.

For K=9, we see some divergence between the ancestral components inferred from Reference II as compared to Reference I. Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples.

Reference II Admixture K=9

Here's a summary of the ancestral components inferred from Reference II dataset:

K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9
Eurasian European S Asian S Asian S Asian S Asian S Asian S Asian
African E Asian European European European European SW Asian European
African E Asian E Asian E Asian SE Asian European SW Asian
African SW Asian SW Asian SW Asian SE Asian SE Asian
African Papuan Papuan Papuan Papuan
African NE Asian NE Asian NE Asian
African W African Polynesian
E African W African
E African

I might do some admixture runs for Reference II with Harappa participants later.


  1. "Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples."

    Interesting. I understand that it might take seom time, but if you run it without those latter populations, might we get a component more like the Kalash one?

  2. Almost all the Chinese are now around 50% SE Asian, didn't see this before is it right.

    • The Northeast Asian component is modal among the Japanese. Thus the Chinese probably should be somewhat mixed with what I have termed Southeast Asian. However, I am going to look at the individual samples of the Chinese to see if there's variation between individuals.

  3. I've been playing around a little with the Xing dataset. Here's a PCA, minus the African populations. (Maybe I should've removed the Amerindian ones as well.) The lousy labeling is due to my not really knowing how to use gnuplot.

    Here are the South Asian populations. Note that since I included myself in this second run, there were only 40,808 SNPs after pruning -- though it doesn't look like anyone's shifted that much as a result. (As Zack noted earlier, the Xing dataset doesn't have that many SNPs in common with 23andMe's chip.) With that caveat in mind, it looks like some AP Brahmins are shifted towards the tribal/Dalit cluster. I'm the red cross with a blue box, by the way.

    I tried merging in the HapMap Gujaratis, but something went wrong and they ended up clustering far away from everyone else. (And defining their own component to boot.) Maybe I forgot to extract the common SNPs -- in which case they're going in tomorrow!

  4. Chinese Samples | Harappa Ancestry Project - pingback on February 16, 2011 at 7:21 am

Trackbacks and Pingbacks: