Distance Measures

Referring to the dendrogram computed from the admixture results of Harappa Project participants, Thorfinn asked a long time ago:

Interesting that South Indian/Cow Belt Brahmins cluster together; while Punjabi Brahmins are closer to Punjabis.

I can understand the first clustering, assuming that Southern Brahmin communities are a spinoff of northern communities and have maintained relative genetic isolation; and the source Northern Brahmin population differed in original origin from other Cow Belt populations.

But how do both Brahmin communities differ equally from Punjabi/Rajasthani Brahmins; and why is that community closer to other Punjabi populations?

In terms of admixture results, that is correct in the case of the project participants. Why this is the case, I have no idea.

However, there is an issue here that we have to consider and nsriram commented about it:

The euclidean distance doesn’t seem to be the appropriate metric to capture the pairwise similarities. Once you make a commitment to the distance measure then the side effects carry-over into the tree construction.

What is a good distance measure to compute the similarity or dissimilarity of the admixture results of two people? Is the Euclidean distance a good one in this case? It certainly is the most common and the easiest to use I guess. So we usually default to it.

However, if we look at the Fst divergences of the ancestral components, we see that the different components are more or less different from each other. So a 5% difference in C1 might not be the same as a 5% difference in C10.

A solution might be to use a weighted distance, but how to weight it? The Fst numbers give pairwise distances for the different ancestral populations. If you are focused on a specific population (e.g., South Asians), we could try weighting by the Fst values between that component and the others. But I am not sure if that's a good solution either.

In the end, a Euclidean distance measure gives us a rough idea of the differences between admixture results, but it should not be used to explain minor differences or to consider phylogenies.


  1. I personally think we should take extreme caution before coming to such a conclusion i.e the Cow belt Brahmins are similar to the South Indian Brahmins, since we have only two individuals, one Bhumihar Brahmin from U.P and another Jain Babhan/Bhumihar from Bihar as a representation. We need far more samples from Uttar Pradesh before coming to such a conclusion. A realistic prediction is that the U.P Brahmins as a whole would probably be somewhere in-between North West Indians and Bihari Brahmins. The Bihari Brahmins and SI Brahmins should be similar as they've had proximity with locals who're presumed to have large amounts of non-ANI/ASI admixture.

    • Rather than say 'locals,' I would either specifically define who the locals are or term them as non-Brahmans as Babhans have been attested in the region in written records since the Asokan period (please see Dhauli & Delhi inscriptions).
      Regarding Bhumihar Brahmans, the evidence is not that clear as some of them did migrate from further west (please see Deshavali Vivriti by Jagamohan Pandit & Vaijjala Bhupati).

      • Parasar, isn't it clear that Brahmins, before coming to the Indo-Gangetic valley dispersed of from some point more North Western-wards? Surely they aren't native to the area, in genetic terms. I do think you have a point when you say that it is partially fallacious to term non-forward class Biharis as locals, as I've come across some excerpts by certain ethnologists that somewhat imply that Brahmins have been in parts of Chota Nagpur even before some of the Harijans. I lost most of these e-Book excerpts with my old system, so it'd be great if you could quote them here (if you have any such excerpts, that is).

        But do you agree that we can't really form conclusions based on two Indo-Gangetic Brahmins and 1 Punjabi Brahmin?

        • Vasishta,

          The sequence of occupation in the Bihar region seems to be Sravak (Indo-Aryan), Santhal/Munda, Oraon, Tharu.
          I do believe that the Brahmans have a western, ie, Indus Valley origin. As to whether the Sravak came from the west also, that is not clear. As you may well know in Sanskrit old, ancestral, and east have the same etymology (Purva - east, old, Purvaja - born in the east, old; Prachya-east, Prachin-old).

          Undoubtedly more samples are needed to draw clear conclusions.

    • Yes, we cannot generalize from such a small sample.

  2. I find it very funny that you refer to "2 weeks" as a long time.

    In the fast paced world of desi genetics, every day counts it seems.

    Aryavarta is severely underrepresented it seems in the studies; also they don't tend to migrate all that much (Mauritius, Fiji and Trinidad? should be good proxies).

    • Yes, Mauritius, Fiji, and Suriname all have a significant Bhojpuri speaking diaspora and would be good proxies. Some of my close relatives are from Mauritius - they speak French and Bhojpuri.

  3. Harappa Admixture Dendrogram 1-80 | Harappa Ancestry Project - pingback on April 13, 2011 at 12:18 pm

Trackbacks and Pingbacks: