I ran PCA on the South Asian populations included in Reference I dataset (excluding Kalash and Hazara) as well as 38 South Asian participants of Harappa Project. I excluded Kalash and Hazara because they usually dominate a South Asian PCA plot being so distinct.
The reference populations included are: Balochi, Bnei Menashe Jews, Brahui, Burusho, Cochin Jews, Gujaratis (divided into two groups), Makrani, Malayan, North Kannadi, Paniya, Pathan, Sakilli, Sindhi, and Singapore Indians.
Here's the spreadsheet showing the eigenvalues and the first 15 principal components for each sample.
I computed the PCA using Eigensoft which removed 26 samples as outliers. The Tracy-Widom statistics show that about 30 eigenvectors are significant.
Here are the first 15 eigenvalues:
1 | 3.874124 |
2 | 1.819077 |
3 | 1.663232 |
4 | 1.335721 |
5 | 1.293500 |
6 | 1.242984 |
7 | 1.230921 |
8 | 1.225775 |
9 | 1.222177 |
10 | 1.214539 |
11 | 1.212808 |
12 | 1.204000 |
13 | 1.198930 |
14 | 1.195450 |
15 | 1.192848 |
Here is a 3-D PCA plot (hat tip: Doug McDonald) showing the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues.
Your browser does not support frames. Go here to see the animation.
Now here are plots of the first 14 eigenvectors. In this case, I have not stretched the principal components, so keep in mind that the first eigenvector explains 3.874124/1.819077=2.13 times variation compared to the 2nd eigenvector.
UPDATE: At the bottom of the 3-D plot, you can see a dropdown. Just select one of the project participants from there and that participant's dot in the plot with become bigger so they are easy to spot.
Recent Comments