Tag Archives: accuracy

Reference 3 Admixture Error Estimation

Since no one paid any attention to the error estimation results for reference I admixture, I am back with the standard error and bias estimates for reference 3 admixture.

So I ran the default 200 bootstrap replicates to measure standard error in our Reference 3 K=11 admixture. Spreadsheet with population level admixture results is here and participant results are here.

Here are some statistics for the standard error estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian 0 0.127 0.9848 0.7505 1.2216 1.6833
C2 Onge 0 0.2074 0.56 0.5404 0.8268 1.6914
C3 E Asian 0 0.2013 0.6123 0.6751 1.136 1.9961
C4 SW Asian 0 0.0874 1.1462 0.9246 1.5347 2.1008
C5 Euro 0 0.042 1.3034 0.9684 1.6582 2.3861
C6 Siberian 0 0.2054 0.6566 0.6712 1.0969 2.0099
C7 W African 0 0 0.01905 0.38847 0.75713 2.1588
C8 Papuan 0 0.1936 0.375 0.3648 0.5308 1.9627
C9 American 0 0.1461 0.3958 0.4646 0.6342 2.0831
C10 San/Pygmy 0 0 0.0708 0.2514 0.4471 2.0991
C11 E African 0 0 0.1235 0.3969 0.7315 1.9318

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

As the average error for the Onge component among South Asian populations is a little higher than 1%, the standard error on the ASI (Ancestral South Indian) computation here is about 1.4-1.5% just from admixture. The regression error is in addition to that.

And statistics for bias estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 -0.9069 -0.28408 -0.0349 -0.12196 0.01158 0.5856
C2 -0.7701 0 0.04005 0.03847 0.153 0.5703
C3 -0.5778 -0.0888 0.01645 0.02105 0.13737 0.6127
C4 -0.7701 -0.1657 0 -0.06692 0.01298 0.745
C5 -1.2917 -0.247675 0 -0.113631 0.008975 0.6763
C6 -0.7921 -0.0856 0.0129 0.009492 0.1198 0.6464
C7 -0.5745 0 0 -0.02173 0.0016 0.3426
C8 -0.1842 0.05328 0.13175 0.1377 0.21247 0.4712
C9 -0.4202 0.0096 0.0811 0.0915 0.1682 0.5129
C10 -0.4596 0 0.0002 0.003271 0.023425 0.3447
C11 -0.5766 0 0.0018 0.02276 0.05758 0.6346

You can also see the average value of the bias in each ancestral component for each population.

Reference I Admixture Errors

I am have thinking about error estimation for Admixture results for some time since I have heard a lot of arguments about how even 0.1% result is significant. I was skeptical of that and have rounded off my admixture run results to the nearest percent.

There was a memory leak issue in the bootstrapping code for admixture which crashed it every time I tried running it. I emailed David Alexander and he fixed it in version 1.12.

So I ran the default 200 bootstrap replicates to measure standard error in our old Reference I K=12 admixture. Spreadsheet with population level results is here and participant results are here.

Here are some statistics for the standard error estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian 0.00% 0.02% 0.33% 0.52% 0.96% 1.93%
C2 Blch/Cauc 0.00% 0.00% 1.02% 0.79% 1.45% 2.63%
C3 Kalash 0.00% 0.01% 0.40% 0.50% 0.99% 3.76%
C4 SE Asian 0.00% 0.09% 0.37% 0.60% 1.27% 1.92%
C5 SW Asian 0.00% 0.00% 0.60% 0.66% 1.28% 2.90%
C6 Euro 0.00% 0.00% 0.35% 0.56% 1.12% 1.82%
C7 Papuan 0.00% 0.07% 0.22% 0.23% 0.36% 1.08%
C8 NE Asian 0.00% 0.07% 0.36% 0.67% 1.36% 2.45%
C9 Siberian 0.00% 0.08% 0.37% 0.51% 0.82% 2.29%
C10 E Bantu 0.00% 0.00% 0.00% 0.35% 0.72% 1.93%
C11W Afr 0.00% 0.00% 0.00% 0.28% 0.50% 1.51%
C12 E Afr 0.00% 0.00% 0.05% 0.31% 0.60% 1.79%

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

And statistics for bias estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian -1.104% -0.031% 0.000% -0.024% 0.075% 1.026%
C2 Blch/Cauc -0.835% -0.280% -0.009% -0.133% 0.000% 1.049%
C3 Kalash -1.575% 0.000% 0.020% 0.076% 0.147% 0.615%
C4 SE Asian -0.629% -0.021% 0.011% 0.018% 0.087% 0.478%
C5 SW Asian -0.691% -0.094% 0.000% -0.020% 0.035% 0.613%
C6 Euro -0.572% -0.086% 0.000% -0.039% 0.004% 0.468%
C7 Papuan -0.171% 0.008% 0.059% 0.070% 0.120% 0.312%
C8 NE Asian -0.739% 0.000% 0.016% 0.034% 0.107% 0.679%
C9 Siberian -1.044% 0.000% 0.015% 0.035% 0.103% 0.692%
C10 E Bantu -0.412% 0.000% 0.000% -0.007% 0.001% 0.370%
C11 W Afr -0.261% 0.000% 0.000% 0.009% 0.005% 0.304%
C12 E Afr -0.635% 0.000% 0.000% -0.017% 0.010% 0.405%

You can also see the average value of the bias in each ancestral component for each population.

Since the bias is lower than the standard error and distributed around zero, if a large number of samples of a population group have some small percentage of an ancestral component, the likelihood of that not being noise is higher.

Admixture: Note on Precision

As you might have seen in the spreadsheets for the reference and for participants, I am rounding off the percentages to the nearest integer.

There is a reason for that. For one thing, there are lots of factors that can influence these results. If I choose somewhat different reference samples, the ancestral components as well as their proportions in different individuals would vary from the current case. This is especially true for minor ancestral components.

I am running admixture for project participants in batches of 10 along with all of my reference dataset. Thus I am be sure that the ancestral components inferred stay the same from one batch to the next.

While the percentages do not vary much for the reference samples from one admixture run to another (with different project participant samples), they do change a little. And I have seen a few changes by as much as 1-2%.

Therefore, these ancestry percentages are at most accurate up to the nearest whole number. There is absolutely no difference between 11.7% and 12.4% for example in my opinion.