Ref3 + Yunusbayev Caucasus Data Admixture

Posted by Zack on November 7, 2011

To my standard reference 3 (list of populations), I added the Yunusbayev et al Caucasus samples which include the following:

20 abhkasians
16 armenians
19 balkars
13 bulgarians
20 chechens
14 kumyks
6 kurds
15 mordovians
16 nogais
15 north-ossetians
15 tajiks
15 turkmens
20 ukranians

These 204 samples increased the total to 4,090.

Then I applied a stricter IBD relationship cutoff than I have before. Previously my focus was on removing relatives, but now I wanted to remove samples that seemed highly inbred or belonged to highly bottle-necked small groups so they would not create their own clusters in Admixture. This process removed the following 164 samples:

maasai 30
papuan 15
karitiana 12
pima 12
onge 8
surui 7
luhya 6
melanesian 6
colombian 5
hadza 5
koryaks 5
sandawe 5
san 4
turkmens 4
african-americans 3
east-greenlanders 3
great-andamanese 3
nganassans 3
chenchu 2
evenkis 2
han-chinese-south 2
maya 2
mbutipygmy 2
mexicans 2
utahn-whites 2
aus 1
bantukenya 1
british 1
chinese-americans 1
gujaratis-b 1
iranians 1
naxi 1
north-kannadi 1
samaritians 1
she 1
tuvinians 1
yemenese 1
yoruba 1
yukaghirs 1

Finally, I added the 165 founders from the Harappa Project participants (up to HRP0180).

The crossvalidation error for the admixture results with K (number of ancestral components) from 2 to 20 is plotted here.

Zooming in,

The lowest crossvalidation errors are for K=17 and K=12.

The admixture results are in a spreadsheet.

In addition to K=17 and K=12, take a look at the results for K=15.

PS. I should point out that the names for the ancestral components are just useful mnemonics based on the current distribution of that component. Also, a component with the same name at one value of K is different from a similarly named component at another K.

Admixtureharappa, reference

← Admixture (Ref3 K=11) HRP0181-HRP0190

Ref3 + Yunusbayev Harappa Admixture Results →

7 Comments.

JDP November 7, 2011 at 10:39 am

Where are the punjabis represented in the reference list? I see kashmiri pandits, sindhis, Iranians and pathans. I see singapore Indians, what about Indians in general?
- AV November 7, 2011 at 12:31 pm
  
  The only autosomal DNA study wherein they've used a Punjabi reference sample is in Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping - Jinchuan Xing et al. They used 25 Punjabi Arains from Pakistan in the study. Zack doesn't use the Xing et al populations in most of his admixture runs because they have a poor and limited SNP overlap with 23andMe data if I remember correctly.
AV November 7, 2011 at 12:35 pm

Zack said;
"Finally, I added the 165 founders from the Harappa Project participants (up to HRP0180)."

In light of this, will you be posting individual admixture proportions at all these K's sometime? While I'm assuming this run is only experimental, that'd still be cool.
- Zack November 7, 2011 at 3:52 pm
  
  Yes, I'll post the participant results in a few days.
Ibra November 8, 2011 at 11:42 pm

Nicely Zack done thanks! Also what interpretation does one give to the "South European" component after K = 11. It kind of looks like South West Asian and European mix.
- Zack November 10, 2011 at 11:14 am
  
  It's the Sardinian-centered component other genome bloggers have also found. While Sardinians are a fairly isolated population, this component seems to be present in a fairly wide range especially near the Mediterranean.
Ref3 + Yunusbayev Harappa Admixture Results | Harappa Ancestry Project - pingback on November 10, 2011 at 11:12 pm

Trackbacks and Pingbacks:

Ref3 + Yunusbayev Harappa Admixture Results | Harappa Ancestry Project - Pingback on 2011/11/10/ 23:12

Harappa Ancestry Project

Genetics and South Asia