Reference II Admixture Analysis K=2-5

Our Reference II Dataset has 3,161 samples with 544 South Asians belonging to 24 ethnic groups. Unfortunately, we can do our admixture analysis on about 23,000 SNPs.

The ancestral population averages for each ethnic group from the admixture analysis can be seen in this spreadsheet. I have also calculated the standard deviation of the ancestral components for the samples in each ethnic group.

Here are the results for K=2.

Reference II Admixture K=2

For K=3, we get the ancestral populations: European, E Asian, African.

Reference II Admixture K=3

For K=4, the ancestral populations are South Asian, European, East Asian and African.

Reference II Admixture K=4

Let's compare the results of K=4 admixture analysis of Reference I and Reference II datasets.

While there is some difference in the average percentages of ancestral components computed with the two reference datasets, most of the differences are 1% or less. The mean absolute difference for the four components is as follows:

Ancestral Component Mean(Abs(Ref1-Ref2))
South Asian (C1) 0.92%
European (C2) 0.58%
East Asian (C3) 0.52%
African (C4) 0.32%

I have highlighted the larger differences which affect: Balochi, Kalash, Malayan, Melanesian, Papuan, and Samaritians. Even then the largest change is about 5%.

Let's also look at the Fst divergences. Here's for Reference I admixture results:

C1 C2 C3
C2 0.071
C3 0.083 0.109
C4 0.152 0.152 0.184

And for Reference II:

C1 C2 C3
C2 0.074
C3 0.086 0.118
C4 0.156 0.159 0.194

The Fst numbers for Reference II are somewhat higher.

Considering that Reference II has only one-eighth of the SNPs of Reference I, the results are fairly good.

Here's K=5 admixture analysis for Reference II:

Reference II Admixture K=5

Higher K values to follow.


  1. Reference II Admixture Analysis K=6-9 | Harappa Ancestry Project - pingback on February 12, 2011 at 8:17 am

Trackbacks and Pingbacks: