Ref1 South Asians + Harappa PCA Clusters

Using the PCA results of the South Asians in Reference I as well as Harappa participants, I ran a couple of clustering algorithms.

First, I scaled the principal components by the respective eigenvalues.

Using Euclidean distance for hierarchical clustering with complete linkage, here's the dendrogram for the Harappa Project participants.

You can compare this to the Admixture-based dendrogram:

The most obvious thing is that I (HRP0001) am an outlier by far.

We inferred three major clusters with the admixture results. Those are intact, though changed a little.

I also ran MClust on the PCA data. The optimum number of clusters was 14. The resulting cluster assignments can be seen in a spreadsheet.

For the Harappa Project participants, the numbers give the probability of assignment to a cluster. For example, for HRP0009 there is a 72% of belonging to cluster 4. For the reference populations, the numbers give the expected number of samples assigned to a cluster.


  1. While the three major clusters remain roughly intact in both dendrograms, they are linked in different patterns.

    Also, I noticed you have gotten some more participants, including some more from U.P. Nice!

  2. Zack,
    Re: Saurashtrian sample - Is that person a Gujarat Saurashtrian or a Tamil Nadu Saurashtrian?

  3. Harappa Clustering | Procrastination - pingback on March 21, 2011 at 8:14 am

Trackbacks and Pingbacks: