Tag Archives: eigenvectors

Ref 2 South Asians + Harappa PCA

I ran PCA on the South Asian populations included in Reference II dataset as well as 38 South Asian participants of Harappa Project. This is sort of a complementary analysis to the Ref1 South Asian one, as this one includes Kalash, Hazara and the additional South Asian groups in Xing et al.

The reference populations included are: Andhra Brahmin, Andhra Madiga, Andhra Mala, Balochi, Bnei Menashe Jews, Brahui, Burusho, Cochin Jews, Gujaratis, Gujaratis-B, Hazara, Irula, Kalash, Makrani, Malayan, Nepalese, North Kannadi, Paniya, Pathan, Punjabi Arain, Sakilli, Sindhi, Singapore Indians, Tamil Nadu Brahmin, and Tamil Nadu Dalit.

Here's the spreadsheet showing the eigenvalues and the first 15 principal components for each sample.

I computed the PCA using Eigensoft which removed 13 samples as outliers. The Tracy-Widom statistics show that about 25 eigenvectors are significant.

Here are the first 15 eigenvalues:

1 6.374483
2 3.650626
3 3.270121
4 2.999767
5 1.937818
6 1.713315
7 1.538295
8 1.503051
9 1.458331
10 1.448079
11 1.433288
12 1.414678
13 1.408943
14 1.390791
15 1.38101

Here is a 3-D PCA plot (hat tip: Doug McDonald) showing the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues. Also, you can highlight the individual project participants in the plot by using the dropdown list below the plot.

Now here are plots of the first 14 eigenvectors. In this case, I have not stretched the principal components, so keep in mind that the first eigenvector explains 1.75 times variation compared to the 2nd eigenvector.

Ref 1 South Asians + Harappa PCA

I ran PCA on the South Asian populations included in Reference I dataset (excluding Kalash and Hazara) as well as 38 South Asian participants of Harappa Project. I excluded Kalash and Hazara because they usually dominate a South Asian PCA plot being so distinct.

The reference populations included are: Balochi, Bnei Menashe Jews, Brahui, Burusho, Cochin Jews, Gujaratis (divided into two groups), Makrani, Malayan, North Kannadi, Paniya, Pathan, Sakilli, Sindhi, and Singapore Indians.

Here's the spreadsheet showing the eigenvalues and the first 15 principal components for each sample.

I computed the PCA using Eigensoft which removed 26 samples as outliers. The Tracy-Widom statistics show that about 30 eigenvectors are significant.

Here are the first 15 eigenvalues:

1 3.874124
2 1.819077
3 1.663232
4 1.335721
5 1.293500
6 1.242984
7 1.230921
8 1.225775
9 1.222177
10 1.214539
11 1.212808
12 1.204000
13 1.198930
14 1.195450
15 1.192848

Here is a 3-D PCA plot (hat tip: Doug McDonald) showing the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues.

Now here are plots of the first 14 eigenvectors. In this case, I have not stretched the principal components, so keep in mind that the first eigenvector explains 3.874124/1.819077=2.13 times variation compared to the 2nd eigenvector.

UPDATE: At the bottom of the 3-D plot, you can see a dropdown. Just select one of the project participants from there and that participant's dot in the plot with become bigger so they are easy to spot.