South Asian PCA + Mclust

I combined reference 3 with Metspalu et al data and Harappa Ancestry Project participants (up to HRP0200). Then I kept only those individuals whose combined proportion of South Asian and Onge components on my reference 3 admixture results was more than 50%.

I ran PCA on these South Asian samples and kept 31 dimensions. Running Mclust on the PCA results gave me 37 clusters.

The clustering results are in a spreadsheet.

For an individual, the value under a specific cluster shows the probability of that person belonging to that cluster. For example, HRP0152 has a 58% probability of belonging to cluster CL8 and 42% probability of being in cluster CL14.

For the populations in the first sheet, I added up the probabilities of all the samples in that population to get the expected number of individuals of that ethnicity belonging to a specific cluster.

In the second sheet, I have listed all the individual samples' clustering results.

There are some outliers who didn't belong in any cluster: HRP0001 (me, of course), 7 (out of 18) Makranis, 4 (out of 23) Sindhis, 3 (all) Great Andamanese, 1 (out of 20) Balochi, 1 (out of 4) Madiga, and 1 (only) Onge.


  1. loox like my parents are just in a grab-bag of pops in with ~10% or so east asian admixture (e.g., kol, burusho, uttaranchal brahmins, etc.)

  2. Is it possible to illustrate the results in the form of a plot? Just wondering..

  3. GIven the mythology of origins and historically strict endogamy, it's not surprising at all that the three Bengali Brahmins are in cluster 3 with most of the Metspalu UP Brahmins and not with the other Bengalis.

    • Yes makes sense. Plus the mythology goes the other way too whereby many Haryana, Punjab, Konkan (Trihotra), and UP Brahmin claim a Gaur origin. There is a theory that originally Gaur (Sravasti) was close to the location where this Metspalu UP Brahmins were sampled from, but the main authority is that Gaur was Bihar and W. Bengal too.

      "The local Gaurs are found in two places separated by a wide interval,—namely, the vicinity of Delhi, and Bengal. Common tradition points to Bengal as their original seat; but as we know that the eastern part of the country was occupied by the Brahmans at a period subsequently to their immigration into the western provinces, this is manifestly erroneous. As Hariana, Hastinapur, and the neighbouring country formed one of the earliest seat of the Brahmans in India, it is not improbable that the modern Gaurs of that quarter, together with those in Bengal and elsewhere, who have branched off from them, are their lineal descendants. The antiquity of these primitive Gaurs, combined with their wandering character, may have gradually given rise to the custom of designating the Brahmans generally over a wide extent of country as Gaurs, and so may have been adopted as a term applicable to all the tribes within its bounds. But the subject is involved in mystery and uncertainty."

      "that the strongest clan of Gaur is in the Central Doab. They say that they came from Narnal, from which place Nar in Rasulabad, the residence of a Gaur Raja, derives its name. The Rajas of Saket, Kishtawar, Mandl and Keonthal, in the Himalayas, between Simla and Kashmlr, are all Gaur Rajpoots. He of Saket is a Chamar-Gaur. They all state that their families came originally from Bengal"

  4. CL1 is interesting - 62 Gujaratis A and the 2 Gujarati participants of our project. Likely a Patel cluster.

  5. South Asian PCA Plots | Harappa Ancestry Project - pingback on January 10, 2012 at 7:43 pm
  6. ChromoPainter/fineStructure South Asians | Harappa Ancestry Project - pingback on January 31, 2012 at 12:03 pm

Trackbacks and Pingbacks: