Harappa Reference Population Similarity

I was not satisfied with the median IBS with reference populations method for checking how similar you are to different populations. So I took inspiration from Dienekes' population concordance ratio to compute another measure.

Let's say we have a Harappa participant h and we want to compare h to a reference population A. We can then divide our reference dataset into the in-group A and the out-group A' (which consists of everyone not in A). Now for every individual a belonging to group A and every individual a' belonging to group A', we can compare the IBS similarities and score them as:

The condition in this equation is true when Harappa participant h is more similar to individual a in population A than he is to individual a' who's not in population A and h and a are closer to each other than a is to a'.

We can then sum up these values over the whole set of populations A and A' and divide by the number of pairs .

This score tells us how similar h is to population A compared to all the reference samples not in population A and varies from 0 (most disimilar) to 1 (most similar).

Let's see how the Harappa participants HRP0001 to HRP0089 score with the different reference 3 populations.

Go to the spreadsheet and click on your Harappa ID to sort the populations by your similarity score with them (click two times if you want to sort in decreasing order which I like better).

The first sheet Sheet1 has all the populations. In the Filtered 1 sheet, I removed 13 African populations that had really low similarity scores with all participants and recomputed the scores.

In Filtered 2, I further removed 9 populations (East Africa, America, Oceania) with low scores for everyone.

In Filtered 3, another 40 populations with low scores with at least 88 (out of 89) Harappa participants were removed. The reason I removed populations and recomputed is that this made the out-group not as different from the in-group as it was before. So we can check if this algorithm can provide us with some meaningful difference in scores with close populations.

In Filtered 4, another 25 populations were removed making it more South Asian centered.

Finally, I used the 68 unmixed South Asian Harappa participants and did a South Asian specific run (though I cheated a bit and kept myself HRP0001 and my sister HRP0035 in). The most interesting thing here is the really high score the Patel Gujaratis get with the Gujarati-A reference population.


  1. The most interesting thing here is the really high score the Patel Gujaratis get with the Gujarati-A reference population.


  2. just a shout out, do some of you get erratic close matches in terms of region/caste rank? i noticed this before, but it occurs with both me and my parents it seems. i wonder if it is due to admixture in my family, as we're a composite.

  3. For me, kashmiri-pandit dropped to number 2, behind gujaratis-a. The %ages vary but the sequence is the same throughout gujaratis-a > kashmiri-pandit > gujaratis-b.

    Reich et al: "Interestingly, one of the GIH subgroups fall outside the main gradient of Indian groups, suggesting that they harbor substantial ancestry that is not a simple mixture of ASI and ANI. A speculative hypothesisis that
    some Gujarati groups descend from the founders of the “Gurjara Pratihara” empire, which is thought to have been founded by Central Asian invaders in the 7th century A.D. and to have ruled parts of northwest India from the 7-12th centuries. I. Karve noted that endogamous groups with names like “Gurjar” are now distributed throughout the northwest of the subcontinent, and hypothesized that that they likely trace their names to this invading group."

    Now, I have not seen any evidence of Central Asia Gurjara invaders, but my connection to the Gurjaras of Kannauj is distinctly possible.

  4. { Brown Pundits } » A Resolution of the Gujaratis? - pingback on April 27, 2011 at 5:42 pm
  5. Zack, I know I request too much, but could you do the same update you did for the previous spreadsheed? Those of us who are not project participants are more interested with investigating the reference populations rather than the project participants.

  6. BTW, the results of this analysis make more sense than the previous IBS analysis.

  7. Vasishta (HRP0072)

    Zack, would it be possible to repeat the same exercise with the Reference II populations? These results seem to be far more plausible for every participant as compared to the previous ones.


  8. { Brown Pundits } » Razib Khan, adivasi token - pingback on April 29, 2011 at 1:10 am
  9. Harappa Reference 2 IBS Concordance | Harappa Ancestry Project - pingback on April 30, 2011 at 3:16 pm

Trackbacks and Pingbacks: