Reference 3 PCA

Here's the Principal Component Analysis (PCA) of Reference 3 data.

First the 3-D plot of the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues.

And now the plots of the first 24 principal components. Please note that the eigenvectors are not scaled by the corresponding eigenvalues in these plots (unlike the 3D plot).

Here are the first 24 eigenvalues (expressed as percentage of the sum of all eigenvalues):

6.417%
4.045%
0.746%
0.624%
0.336%
0.330%
0.296%
0.250%
0.218%
0.166%
0.140%
0.131%
0.119%
0.112%
0.108%
0.105%
0.098%
0.087%
0.086%
0.080%
0.075%
0.073%
0.073%
0.071%

Together, the first 24 eigenvectors explain 14.79% of the variation in the data.

According to the Tracy-Widom statistics from eigensoft, the number of significant principle components is 118.

UPDATE: I thought the eigenvectors 2 & 4 looked interesting for South Asians so I plotted them together.

12 Comments.

  1. i wish we could see tesseracts 🙂

    • While you work on that, I can generate another 3-D plot. Lol

      Which dimensions do you want? 2-3-4, 1-2-4, or 1-3-4?

  2. How accurate is it to categorize Iranians, Iranian Jews and Uzbekistan Jews as Central Asian and Hazaras as South Asian based on the genetic results? I am not saying you should categorize every population based on genetics, but I think the Central Asian Turkic populations and Hazaras have a unique and relatively homogeneous genetic character that deserves a special category.

    BTW, is it possible to add a function that highlights on the plot the reference population you choose from a list?

    • Not very accurate, but I needed a small number of categories and I used a mix of genetics and geography.

      A larger number of categories would just have cluttered the plot even more.

      Using genetics to categorize groups would require some of the northeastern Indian populations (like the Aonaga) to be classified as East Asian.

    • About the highlighting on the 3-D plot, I'll see what I can do.

  3. Where do the Onge cluster?

  4. Reference 3 + HAP PCA | Harappa Ancestry Project - pingback on June 5, 2011 at 9:02 am
  5. South Asian PCA + Mclust | Harappa Ancestry Project - pingback on December 21, 2011 at 8:29 am