Ref2 South Asian + Harappa Admixture

Using the reference II dataset of 548 South Asians and 38 Harappa Project South Asians that I have been working on, I ran Admixture.

The optimum number of ancestral components was 5-6. So I used K=6. The components are highest among the following groups:

C1 Brahui, Makrani, Balochi C2 TN Dalit, North Kannadi
C3 Irula C4 Gujaratis
C5 Hazara C6 Kalash

I consider the Irulas, a Scheduled tribe from Tamil Nadu, to be problematic in a similar way to the Kalash except that the Irulas are well-scattered in their own space in the PCA plot.

Also, note that all the European, West Asian, etc is being represented by C1. Similarly, all the East Asian ancestry is being collected in C5.

The spreadsheet showing the admixture results is here. The first sheet shows the individual results for the project participants.

The 2nd sheet shows the average (and standard deviation) for the reference populations.

The 3rd sheet shows the average and standard deviation for each cluster computed by MClust from PCA.

The 4th sheet shows the average and standard deviation for each cluster computed by MClust from MDS.

Also, take a look at the admixture percentage standard deviations. You'll notice that those are generally lower for the clusters compared to the population groups.


  1. Excellent , something i can use to clusterise south asia now.

  2. Regarding the "well-scattered" Irula:
    "Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations."
    "Among Indian populations, the tribal Irula and HapMap GIH have the shortest distance to East Asian populations while Brahmin has the largest distance."

  3. If you look at the Gujarati component, it seems to confirm some connection between Gujaratis and Kayasthas.
    1. The titles Datta, Gupta, Nandi, Ghosh, Das, Nagadatta, Mitra etc were the titles among Gujarati Nagars and they are still common among Bengali Kayasthas.
    2. Both groups show Brchycephaly to a high degree.
    3. Both were/are very literate groups and the names two main north Indian scrips derive from them - Kaithi and Nagari.