Category Archives: Admixture - Page 9

Admixture K=12, HRP0081-HRP0090

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results.

If you can't see the interactive bar chart above, here's a static image.

The two new Assyrians (HRP0081 & HRP0082) are pretty similar to the earlier Assyrian participant HRP0010.

HRP0087 is an interesting case with ancestry from France, Martinique, Madagascar and India. I can't be certain but the ratio of South Asian to Balochistan/Caucasus components seems to point in the direction of northern Indian ancestry. I definitely need to do a supervised admixture run for the mixed participants.

HRP0089 is Kazakh and has one-third Siberian component. That's higher than Uygurs (21%) and Uzbeks (23%) in my reference set. HRP0089 also has little bit more European component than the average Uygur or Uzbek in my reference.

PS. This was run using Admixture version 1.04.

Admixture K=4, HRP0081-HRP0090

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results.

It would be interesting to see how the Kazakh and the mixed French/Madagascar/Martinique/Indian participants get on K=12.

If you can't see the interactive bar chart above, here's a static image.

PS. This was run using Admixture version 1.04 and using reference I. Probably the last batch for both.

PPS. For some reason, my efforts to reduce the font in the table are unsuccessful. Since we are close to 100 participants now, I need to find a better way for you guys to visualize these results. May be a slice at one time.

Reference 3 Admixture Data

Onur asked:

BTW, Zack, are you planning to publish (in this blog or in a medium like Rapidshare) the ADMIXTURE results of the reference populations on an individual by individual basis like Dienekes?

So, I have uploaded a zip file which contains admixture results from K=2 to K=17 for all individual samples in the Reference 3 dataset. Do note that K=14 had the lowest crossvalidation error and I actually prefer even lower values of K.

I have plotted the population averages on the blog but this contains the individual level data. There are two files for each value of K. One is ref3.K.Q which has the admixture proportions for each individual. The other is ref3.K.F which has the allele frequencies for the inferred ancestral components. I haven't been able to look at the allele frequency files at all, so if you find anything interesting there, do let me know.

There is also a info file (

ref3_info.csv

) in the archive which has the information about the samples in the same order as their results are listed in the admixture output.

Reference 3 Admixture K=14

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=14.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

This one I am going to classify as a bad run. The east Asian splits are weird.

Fst divergences between estimated populations for K=14 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13
C2 0.109
C3 0.110 0.160
C4 0.239 0.264 0.247
C5 0.107 0.080 0.161 0.267
C6 0.116 0.111 0.176 0.284 0.102
C7 0.132 0.180 0.092 0.265 0.176 0.195
C8 0.189 0.237 0.214 0.335 0.239 0.251 0.237
C9 0.178 0.206 0.154 0.324 0.192 0.229 0.164 0.294
C10 0.217 0.246 0.191 0.373 0.242 0.262 0.229 0.338 0.285
C11 0.209 0.220 0.248 0.350 0.230 0.223 0.272 0.314 0.312 0.344
C12 0.266 0.278 0.307 0.417 0.286 0.281 0.333 0.373 0.374 0.406 0.179
C13 0.143 0.143 0.186 0.287 0.149 0.135 0.209 0.254 0.247 0.278 0.117 0.177
C14 0.364 0.368 0.410 0.528 0.372 0.377 0.437 0.490 0.481 0.514 0.334 0.359 0.283

This is the last plot I am posting in this series of admixture runs since the crossvalidation error is minimized at K=14.

For some reason, Admixture starts acting weird at values of K higher than about 14-15.

Reference 3 Admixture K=13

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=13.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

The Hadza were expected to split but I thought the San/Pygmy would split first.

Fst divergences between estimated populations for K=13 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
C2 0.093
C3 0.098 0.141
C4 0.179 0.212 0.192
C5 0.100 0.056 0.150 0.224
C6 0.112 0.149 0.075 0.210 0.153
C7 0.109 0.062 0.161 0.234 0.072 0.170
C8 0.181 0.222 0.208 0.279 0.232 0.226 0.239
C9 0.198 0.202 0.239 0.308 0.217 0.254 0.208 0.306
C10 0.164 0.186 0.145 0.276 0.184 0.146 0.217 0.290 0.303
C11 0.320 0.318 0.365 0.443 0.336 0.381 0.325 0.444 0.284 0.437
C12 0.261 0.263 0.302 0.377 0.277 0.318 0.270 0.371 0.153 0.370 0.278
C13 0.137 0.124 0.180 0.248 0.138 0.193 0.121 0.250 0.088 0.241 0.288 0.163

Admixture Onge Component Map

Since the Onge component on my K=11 admixture run was very strongly correlated with Reich et al's Ancestral South Indian (r2Simranjit has been kind enough to let me share his map of the Onge component in South Asia.

He also has maps of the K=12 admixture run.

Reference 3 Admixture K=12

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=12.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Of course, the K=11 Onge component was too good to last. Onge are too different from the other populations, so of course they get their isolated component.

Fst divergences between estimated populations for K=12 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C2 0.089
C3 0.093 0.133
C4 0.172 0.211 0.189
C5 0.103 0.080 0.155 0.234
C6 0.094 0.055 0.140 0.218 0.056
C7 0.113 0.143 0.068 0.213 0.169 0.147
C8 0.179 0.219 0.204 0.280 0.237 0.225 0.228
C9 0.177 0.182 0.214 0.285 0.181 0.187 0.232 0.283
C10 0.164 0.178 0.139 0.276 0.214 0.180 0.143 0.290 0.280
C11 0.151 0.150 0.190 0.260 0.150 0.154 0.207 0.262 0.059 0.255
C12 0.256 0.260 0.295 0.373 0.261 0.265 0.314 0.367 0.116 0.364 0.131

Reference 3 Admixture K=11

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

Reich ASI % Onge Component %
Mala 61.2 39.9
Madiga 59.4 37.9
Chenchu 59.3 38.6
Bhil 57.1 37.5
Satnami 57 36.4
Kurumba 56.8 39.5
Kamsali 55.5 35.5
Vysya 53.8 34.4
Lodi 50.1 31.8
Naidu 49.9 32.1
Tharu 49 32.2
Velama 45.3 28.9
Srivastava 43.6 27.8
Meghawal 39.7 25.4
Vaish 37.4 23.8
Kashmiri-Pandit 29.4 17.6
Sindhi 26.3 13.4
Pathan 23.1 10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C2 0.165
C3 0.121 0.122
C4 0.090 0.161 0.152
C5 0.071 0.152 0.137 0.048
C6 0.134 0.144 0.067 0.163 0.143
C7 0.184 0.224 0.216 0.179 0.186 0.232
C8 0.210 0.209 0.205 0.235 0.223 0.228 0.286
C9 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.290
C10 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364
C11 0.150 0.195 0.187 0.143 0.148 0.203 0.059 0.260 0.252 0.133

Reference 3 Admixture K=10

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=10.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=10 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9
C2 0.110
C3 0.073 0.148
C4 0.090 0.161 0.065
C5 0.185 0.215 0.222 0.234
C6 0.099 0.038 0.138 0.152 0.201
C7 0.112 0.084 0.142 0.163 0.226 0.058
C8 0.166 0.217 0.182 0.171 0.277 0.211 0.225
C9 0.159 0.156 0.183 0.214 0.287 0.133 0.139 0.276
C10 0.233 0.286 0.248 0.243 0.349 0.280 0.295 0.097 0.349

Reference 3 Admixture K=9

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=9.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=9 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.064
C5 0.184 0.201 0.220 0.232
C6 0.113 0.068 0.147 0.166 0.223
C7 0.166 0.210 0.181 0.171 0.275 0.228
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.276
C9 0.233 0.279 0.247 0.243 0.346 0.298 0.096 0.349