Ref1 South Asians + Harappa PCA Clusters

Posted by Zack on March 19, 2011

Using the PCA results of the South Asians in Reference I as well as Harappa participants, I ran a couple of clustering algorithms.

First, I scaled the principal components by the respective eigenvalues.

Using Euclidean distance for hierarchical clustering with complete linkage, here's the dendrogram for the Harappa Project participants.

You can compare this to the Admixture-based dendrogram:

The most obvious thing is that I (HRP0001) am an outlier by far.

We inferred three major clusters with the admixture results. Those are intact, though changed a little.

I also ran MClust on the PCA data. The optimum number of clusters was 14. The resulting cluster assignments can be seen in a spreadsheet.

For the Harappa Project participants, the numbers give the probability of assignment to a cluster. For example, for HRP0009 there is a 72% of belonging to cluster 4. For the reference populations, the numbers give the expected number of samples assigned to a cluster.

Clusters, PCAdendrogram, harappa, mclust, south asia

← Harappa Participant Admixture Maps

South Asian Map →

4 Comments.

sv March 19, 2011 at 10:42 am

While the three major clusters remain roughly intact in both dendrograms, they are linked in different patterns.

Also, I noticed you have gotten some more participants, including some more from U.P. Nice!
Parasar March 20, 2011 at 1:01 am

Zack,
Re: Saurashtrian sample - Is that person a Gujarat Saurashtrian or a Tamil Nadu Saurashtrian?
- Vasishta March 20, 2011 at 2:16 am
  
  He's Gujarati.
Harappa Clustering | Procrastination - pingback on March 21, 2011 at 8:14 am

Trackbacks and Pingbacks:

Harappa Clustering | Procrastination - Pingback on 2011/03/21/ 08:14

Harappa Ancestry Project

Genetics and South Asia