Reference 3F(iltered) Admixture

Posted by Zack on May 5, 2011

I removed all American populations and San and Pygmy (i.e., South and Central African) from Reference 3 for a better focus on our target populations.

Here are the admixture results. You can choose the number of ancestral components, K, from the dropdown below.

K=13, 14, 15 (in that order) have the lowest cross-validation error.

There's a bunch of interesting results in there. For example, the split into northern and southern European, and the split of Siberian into Siberian and Russian Far East (or Bering Strait). However, the Onge component as a proxy of the ASI does not appear. Also, we don't get much breakdown of the South Asian populations as we would like.

Admixturereference

← Harappa Nearest IBS Neighbors

Reference I Admixture Errors →

19 Comments.

razib May 5, 2011 at 12:35 am

win some. lose some.
Onur May 5, 2011 at 9:09 am

Zack, did you remove the European-like Siberian samples in your references before your first reference 3 genetic analysis, or do they still remain in your references? David of Eurogenes apparently detected some of them in his studies:

http://eurogenes.blogspot.com/2011/05/russian-like-siberians-in-rasmussen-et.html

http://bga101.blogspot.com/2011/05/russian-like-siberians-in-rasmussen-et.html
- Zack May 5, 2011 at 9:58 am
  
  I know about them but haven't excluded them because it doesn't affect my analysis. I am not doing any regional analysis on them. In my datasets, both Siberians and Russians have been included so those samples would just cluster with Russians or other Europeans.
Onur May 5, 2011 at 12:26 pm

There is a huge difference in terms of African ancestry between the Siddi samples and the Makrani samples. It seems it will be harder to classify Makranis as a branch of Siddis. Or, Siddis may have great variability in the quantity of African ancestry, and the Siddi samples used in Reich et al. (just 4 samples) may just be from a high-level African ancestry segment of the general Siddi population. I wonder from which part of South Asia those 4 Siddis are.
- Zack May 5, 2011 at 12:47 pm
  
  The Reich et al Siddis are from Karnataka. So they are likely not related to Makranis at all.
  
  There is not a lot of variation in their Admixture results.
- raz May 6, 2011 at 7:13 am
  
  Just to clarify, Makranis are southern Baloch with some African admixture. The amount of admixture varies, however it is similarÂ to that found in other communities in Oman and on both sides of the Persian gulf.
  
  The term 'Makrani' is sometimes used interchangeably with 'Sheedi, in Karachi/Pakistan which leads to the confusion, however this is incorrect. Sheedis are a distinct community of African descent with relatively little South Asian admixture (at least in Pakistan).
  - Onur May 6, 2011 at 3:13 pm
    
    Thanks for the information on Siddis (=Sheedis) and Makranis.
Onur May 5, 2011 at 2:46 pm

There is not a lot of variation in their Admixture results.

By variation I referred to the variation in the general Siddi population wherever from South Asia they are.
- Onur May 5, 2011 at 3:39 pm
  
  The general Siddi population, I mean Siddis as a whole, might still have great variability in the quantity of their African ancestry. 4 Siddis all of whom are from Karnataka aren't enough to evaluate the quantity of African ancestry in Siddis as a whole.
Ibra May 5, 2011 at 8:34 pm

Additionally how would the Admixture run behave without Andaman Islanders/Onge, Austo-Melanesians and Kalash. Don't these populations generate a lot of noise in relation to South Asians?
- Zack May 6, 2011 at 9:10 am
  
  I am going to do some runs without those populations to see what happens. I am reluctant to exclude them because I feel they are related to South Asians.
  - Ibra May 7, 2011 at 11:36 am
    
    I think they are too, but It looks to like real ANI is being fractured between Kalash and Baloch-cau simply because the Kalash have been isolated and have some degree of genetic uniqueness. Likewise ASI is being fractured between Onge, South Asian(high enough K) and Papuan. In such a run I would would look for a k where the correlation between SA and ASI is maximized.
    - Zack May 8, 2011 at 10:22 am
      
      The correlation method works but it is crude. Other than the Ref3 K=11 run where the Onge component captured most of the ASI, the correlation between South Asian component and ASI is high for most of my admixture runs. The problem I see with the correlation data is that we are relying on only 18 populations on the Indian cline from Reich et al. These are the groups where a two-way ANI + ASI admixture makes sense. Extrapolating the results from admixture to other groups who have significant admixture from elsewhere can cause problems and lead to wrong estimates of ASI.
      
      For example, in the Ref3 K=11 run, I think I am overestimating ASI among the Austroasiatic groups who have large amounts of southeast Asian admixture.
      
      I am looking into how to solve that problem.
      - Ibra May 9, 2011 at 3:20 pm
        
        Thank you for you explanation Zack.
Vasishta May 5, 2011 at 11:32 pm

Zack, what is the difference between the Kalash and Balochistan/Caucasus at the highest level of K? I thought the Kalash component at the lower levels of K consequently becomes Baloch/Cauc at higher levels of K and the Kalash %ages become rather residual.
- Zack May 6, 2011 at 9:12 am
  
  Yeah the Kalash component at K>=11 is acting weird. It keeps some of the Caucasian/West Asian elements. Another reason to not be happy with this set of runs.
Ibra May 6, 2011 at 1:04 am

"However, the Onge component as a proxy of the ASI does not appear"

"South Asian" becomes a good proxy for ASI when k>=11, fit a regression and see. Before the emergence of the Kalash component at k = 11, the "South Asian" component is more of less composite.
- Zack May 6, 2011 at 7:38 am
  
  Yes, you are correct, though it has higher residuals than the Onge component in Ref 3 K=11.
Ibra May 6, 2011 at 12:02 pm

Even in Ref 3 k = 12 "South Asian" is a proxy for ASI. However at that time "Onge" irregularly fractures off "South Asian" and the SSE of ASI vs South Asian increase by little but not a lot.

Harappa Ancestry Project

Genetics and South Asia