Reference II PCA

Posted by Zack on March 26, 2011

I ran PCA on the Reference II dataset which includes 3.161 samples from various populations but with only 23,000 SNPs in common.

Here are the top ten eigenvalues:

219.225396
146.835968
20.719760
9.721733
7.552482
6.216977
3.991663
3.484690
3.106919
2.805874

While the first two eigenvalues are much bigger than the rest, the first explains 7.12% of the variation and the second 4.77%, the Tracy-Widom stats show that about 54 eigenvectors are significant.

Here are the plots for the first 10 principal components. Remember that the 1st eigenvector is 1.5 times the 2nd.

Here is a 3-D PCA plot (hat tip: Doug McDonald) showing the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues.

</p> <p>Your browser does not support frames. Go <a href="http://www.harappadna.org/wp-content/uploads/2011/03/ref2_pca.html">here</a> to see the animation.</p> <p>

I also ran MClust on the PCA data and got 17 clusters. The results are in a spreadsheet. I am sure with more principal components than the 10 I used, I would be able to deduce finer population structure.

Do take a look at the clusters assigned to the South Asian populations from Xing et al.

Clusters, PCAmclust, reference

← Admixture K=17 maps: Mediterranean and Southwest Asian

End of March Update →

2 Comments.

razib March 26, 2011 at 2:27 pm

my exp. seems to be that fine-grained distinctions in PCA are more robust to sub-100 K SNP marker samples. so i think ref ii is more useful for PCA than ADMIXTURE. or more explicitly, the gap in robustness between ref i and ref ii is far greater in ADMIXTURE than in PCA.
humayun luwi libi March 26, 2011 at 5:24 pm

Why the europe component forms the apex of the thing?

Harappa Ancestry Project

Genetics and South Asia

Reference II PCA

Related

2 Comments.

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Reference II PCA

Share this:

Related

2 Comments.

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll