I have removed San and Pygmy groups from my reference datasets. That meant removing 39 samples from Reference Data I and 61 samples from Reference Data II.

The presence of those groups was creating some weird effects in admixture runs at K=8,9. Basically, the ancestral components for Africans I was getting were not stable. Instead they were varying with/without different Harappa participant batches. Also, at K=10,11, there were too many Africa-only ancestral components, forcing me to run even higher values of K.

Since we are not really interested in African diversity in this project and any African admixture among South Asians is most likely to be East, West or North African instead of Pygmy or San, the removal of these groups should not have any implications for the Harappa Ancestry Project.

  1. better good results than soon results 🙂

  3. If I were you, I would exclude not only San and Pygmy from the reference datasets but other genetically marginal/isolated ethnic groups as well. Kalash are the most obvious one, though as they are a South Asian ethnic group their inclusion in the reference datasets is understandable to a degree. From Europe Sardinians and Basques come to mind.

      I am a bit reticent about excluding groups just because I don't like them ;-). I mean just because they are isolated groups.

      Kalash, I intend to keep because they are South Asian. And while they are isolated, they do seem to have some common ancestry with the other groups in the region.

      Basques haven't shown up as very different yet. But I have only one European ancestral component till K=9.

    • The Kalash due to their isolation could give us the makeup from an ancient period. They are off the cline in a similar way other populations are that did not merge into the mainstream such as Sahariya, Kharia, Santhal, Hazara, Hallaki, or other outliers such as the Onge, Great Andamanese, Siddi, Nyshi,
      Aonaga, Chenchu, etc. It may be worthwhile to see if removal would result in changes as it did with the Pygmy which removal should make people look more African (& vice versa) due the back-migration component becoming higher.

      • I wouldn't consider the Kalash an examplar of the ancient populations of the region. They have a small population and likely have had a few bottlenecks. But they are an interesting case to look at.

        • The bottleneck appears to be well founded. The reason I thought them to be an ancient population is from their Y dna mix - no R2a, a significant H, and their R1 is much lower than their neighbors i.e., they appear to have more of F(xMNOPS) than their neighbors.

  5. I have also been recently playing with SNP datasets (that's fascinating), and I think that San and Pygmy populations might be relevant in the study of South Asian ancestry.

    I have an hypothesis according to which the populations that first colonized South and East Asia could be more related to San/Pygmy than to other African populations.

    However, the evidence is not strong and I'm far from being a population genetics expert; this is highly speculative.

    See my discussion of K=14, pages 16-18 in this preliminary version of a paper I'm trying to write:

    My admixture bar plots are available at
    but I should update them, because the analyses contained 2 mis-labelled individuals from the HGDP dataset.

    • Interesting.

      I have also created a bigger dataset with everything included to do such experiments.

      I am also very interested in finding out how to get hold of the Pan-Asian and Reich et al Indian datasets. Please drop me an email. Thanks!

