reich | Search Results | Harappa Ancestry Project

Ref 1 South Asians + Harappa PCA

Posted by Zack on March 16, 2011 28 comments

I ran PCA on the South Asian populations included in Reference I dataset (excluding Kalash and Hazara) as well as 38 South Asian participants of Harappa Project. I excluded Kalash and Hazara because they usually dominate a South Asian PCA plot being so distinct.

The reference populations included are: Balochi, Bnei Menashe Jews, Brahui, Burusho, Cochin Jews, Gujaratis (divided into two groups), Makrani, Malayan, North Kannadi, Paniya, Pathan, Sakilli, Sindhi, and Singapore Indians.

Here's the spreadsheet showing the eigenvalues and the first 15 principal components for each sample.

I computed the PCA using Eigensoft which removed 26 samples as outliers. The Tracy-Widom statistics show that about 30 eigenvectors are significant.

Here are the first 15 eigenvalues:

1	3.874124
2	1.819077
3	1.663232
4	1.335721
5	1.293500
6	1.242984
7	1.230921
8	1.225775
9	1.222177
10	1.214539
11	1.212808
12	1.204000
13	1.198930
14	1.195450
15	1.192848

Here is a 3-D PCA plot (hat tip: Doug McDonald) showing the first three eigenvectors. The plot is rotating about the 1st eigenvector which is vertical. Also, I have stretched the principal components based on the corresponding eigenvalues.

</p> <p>Your browser does not support frames. Go <a href="http://www.harappadna.org/wp-content/uploads/2011/03/r1_sa_hrp_pca.html">here</a> to see the animation.</p> <p>

Now here are plots of the first 14 eigenvectors. In this case, I have not stretched the principal components, so keep in mind that the first eigenvector explains 3.874124/1.819077=2.13 times variation compared to the 2nd eigenvector.

UPDATE: At the bottom of the 3-D plot, you can see a dropdown. Just select one of the project participants from there and that participant's dot in the plot with become bigger so they are easy to spot.

Admixture K=10-12, HRP0001 to HRP0010

Posted by Zack on February 19, 2011 5 comments

Let's continue our admixture analysis of the first batch of Harappa participants.

Here are their ethnic backgrounds and their admixture analysis results.

You might want to refer to the admixture analysis of the reference dataset.

At K=10,

Batch 1 Admixture K=10

C1	South Asian	C2	Kalash
C3	Southwest Asian	C4	Southeast Asian
C5	European	C6	Papuan
C7	Northeast Asian	C8	Siberian
C9	West African	C10	East African

At K=11,

Batch 1 Admixture K=11

C1	South Asian	C2	Balochistan/Caucasus
C3	Kalash	C4	Southeast Asian
C5	Southwest Asian	C6	European
C7	Papuan	C8	Northeast Asian
C9	Siberian	C10	West African
C11	East African

Note the C2 component, it sounds a bit like ANI (Ancestral North Indian) of Reich et al, though hold off on your conclusions and your excitement for now.

Also, note that this split is different from the results of Reference I K=11 admixture run where the East African split happened. However, at K=12 we get similar components.

At K=12,

Batch 1 Admixture K=12

C1	South Asian	C2	Balochistan/Caucasus
C3	Kalash	C4	Southeast Asian
C5	Southwest Asian	C6	European
C7	Papuan	C8	Northeast Asian
C9	Siberian	C10	East African Bantus
C11	West African	C12	East African

I am going to explore even higher values of K since the crossvalidation errors are still decreasing.

South Asian PCA

Posted by Zack on February 13, 2011 8 comments

I used Eigensoft to create a PCA plot of the South Asians in our Reference I dataset (a total of 398 samples) along with the first batch of South Asian Harappa Project participants (HRP0001 to HRP0009).

The PCA software removed 2 Makranis, 1 Sindhi, 1 Balochi and 1 Brahui as outliers, thus leaving us with 402 samples to perform a PCA on.

Here are the plots for the first four eigenvectors. Click to see bigger images.

South Asian PCA eig1 vs eig2

South Asian PCA eig1 vs eig3

South Asian PCA eig2 vs eig3

South Asian PCA eig1 vs eig4

South Asian PCA eig2 vs eig4

South Asian PCA eig3 vs eig4

If you have seen the South Asian plot at 23andme, the first plot here isn't very different except that it seems rotated.

UPDATE: Eigenvectors 1 through 4 explain 1.12%, 0.77%, 0.71% and 0.44% of the total variance.

Harappa Ancestry Project

Genetics and South Asia

Search Results for: reich - Page 4

Ref 1 South Asians + Harappa PCA

Admixture K=10-12, HRP0001 to HRP0010

South Asian PCA

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Search Results for: reich - Page 4

Ref 1 South Asians + Harappa PCA

Share this:

Admixture K=10-12, HRP0001 to HRP0010

Share this:

South Asian PCA

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll