Reference 3F(iltered) Admixture

I removed all American populations and San and Pygmy (i.e., South and Central African) from Reference 3 for a better focus on our target populations.

Here are the admixture results. You can choose the number of ancestral components, K, from the dropdown below.

K=13, 14, 15 (in that order) have the lowest cross-validation error.

There's a bunch of interesting results in there. For example, the split into northern and southern European, and the split of Siberian into Siberian and Russian Far East (or Bering Strait). However, the Onge component as a proxy of the ASI does not appear. Also, we don't get much breakdown of the South Asian populations as we would like.


  1. win some. lose some.

  2. Zack, did you remove the European-like Siberian samples in your references before your first reference 3 genetic analysis, or do they still remain in your references? David of Eurogenes apparently detected some of them in his studies:

    • I know about them but haven't excluded them because it doesn't affect my analysis. I am not doing any regional analysis on them. In my datasets, both Siberians and Russians have been included so those samples would just cluster with Russians or other Europeans.

  3. There is a huge difference in terms of African ancestry between the Siddi samples and the Makrani samples. It seems it will be harder to classify Makranis as a branch of Siddis. Or, Siddis may have great variability in the quantity of African ancestry, and the Siddi samples used in Reich et al. (just 4 samples) may just be from a high-level African ancestry segment of the general Siddi population. I wonder from which part of South Asia those 4 Siddis are.

    • The Reich et al Siddis are from Karnataka. So they are likely not related to Makranis at all.

      There is not a lot of variation in their Admixture results.

    • Just to clarify, Makranis are southern Baloch with some African admixture. The amount of admixture varies, however it is similar to that found in other communities in Oman and on both sides of the Persian gulf.

      The term 'Makrani' is sometimes used interchangeably with 'Sheedi, in Karachi/Pakistan which leads to the confusion, however this is incorrect. Sheedis are a distinct community of African descent with relatively little South Asian admixture (at least in Pakistan).

  4. There is not a lot of variation in their Admixture results.

    By variation I referred to the variation in the general Siddi population wherever from South Asia they are.

    • The general Siddi population, I mean Siddis as a whole, might still have great variability in the quantity of their African ancestry. 4 Siddis all of whom are from Karnataka aren't enough to evaluate the quantity of African ancestry in Siddis as a whole.

  5. Additionally how would the Admixture run behave without Andaman Islanders/Onge, Austo-Melanesians and Kalash. Don't these populations generate a lot of noise in relation to South Asians?

    • I am going to do some runs without those populations to see what happens. I am reluctant to exclude them because I feel they are related to South Asians.

      • I think they are too, but It looks to like real ANI is being fractured between Kalash and Baloch-cau simply because the Kalash have been isolated and have some degree of genetic uniqueness. Likewise ASI is being fractured between Onge, South Asian(high enough K) and Papuan. In such a run I would would look for a k where the correlation between SA and ASI is maximized.

        • The correlation method works but it is crude. Other than the Ref3 K=11 run where the Onge component captured most of the ASI, the correlation between South Asian component and ASI is high for most of my admixture runs. The problem I see with the correlation data is that we are relying on only 18 populations on the Indian cline from Reich et al. These are the groups where a two-way ANI + ASI admixture makes sense. Extrapolating the results from admixture to other groups who have significant admixture from elsewhere can cause problems and lead to wrong estimates of ASI.

          For example, in the Ref3 K=11 run, I think I am overestimating ASI among the Austroasiatic groups who have large amounts of southeast Asian admixture.

          I am looking into how to solve that problem.

  6. Zack, what is the difference between the Kalash and Balochistan/Caucasus at the highest level of K? I thought the Kalash component at the lower levels of K consequently becomes Baloch/Cauc at higher levels of K and the Kalash %ages become rather residual.

    • Yeah the Kalash component at K>=11 is acting weird. It keeps some of the Caucasian/West Asian elements. Another reason to not be happy with this set of runs.

  7. "However, the Onge component as a proxy of the ASI does not appear"

    "South Asian" becomes a good proxy for ASI when k>=11, fit a regression and see. Before the emergence of the Kalash component at k = 11, the "South Asian" component is more of less composite.

  8. Even in Ref 3 k = 12 "South Asian" is a proxy for ASI. However at that time "Onge" irregularly fractures off "South Asian" and the SSE of ASI vs South Asian increase by little but not a lot.