Ref4C Admixture

I removed the Gujarati-A samples from the previous set of runs and ran admixture on the resulting dataset.

Nothing new pops out except that the Siberian component splits into Turkic/Tungusic and Nganasan components at K=12.

The admixture results are in a spreadsheet as usual.

K=11 & 12 have the lowest errors.

At K=13, Chenchu split off as their own cluster.


  1. Zack,

    Small populations with low diversity - Kalash, Onge, Gujaratis-a - have shown propensity to form components. Do yo think some diversity measure would be appropriate to remove such populations from admixture analysis?


  2. Zack,

    Yakuts have strong male founder effect. There were perhaps a hundred Turkic males 1,000 years ago, who emerged into Evenki lands and married Tungus women from ancient time until now. So it is no surprise they share a large portion with Evenki but only because of females. But Turkic element was presented as I said above, by very small number of males.

    • OK. Are there other Turkic populations in my reference dataset?

      Do you know of any papers on autosomal SNPs of Central Asians and Turkic populations?

      • Zack,

        You have Tuvinians, Uighurs and Uzbeks. Look at Uzbeks closely, they seem to form two distinct clusters and I suspect that one cluster of 4 Uzbeks is actually ethnic Tajiks.
        Hazara is a mix of Turkic and Mongolian tribes. If they tend to form two clusters too as Uzbeks do, then one is Turkic cluster and another is Mongolian one.

      • PS. I missed Altaians, but I would not be surprised if they also would split into two parts.

  3. I don't see here what this guy has written here:

    This seems completely biased to nationalist's view!

    Oh god so much politics with the gene!

  4. "At K=13, Chenchu split off as their own cluster."

    Will you be posting K=13?

  5. Misuse of Correlation | Harappa Ancestry Project - pingback on May 28, 2011 at 4:50 pm
  6. Have you considered inserting a dummy (100% ASI) population with this reference set. If you sample from k = 8 or similar that should do the trick and that way you can by pass "Onge" for measuring the frequencies of people's components.

    • Yes I have been looking into that. There is no direct way to get to 100% ASI, so I am comparing a couple of approaches.

  7. Don't know if this helps, but someone mentioned the use of 'SNP simulation routine' options in PLINK and also to alter the .simfreq file with the .F stats from admixture. I have no idea how any of this works, just passing it along 🙂

Trackbacks and Pingbacks: