Tag Archives: bias

Reference 3 Admixture Error Estimation

Since no one paid any attention to the error estimation results for reference I admixture, I am back with the standard error and bias estimates for reference 3 admixture.

So I ran the default 200 bootstrap replicates to measure standard error in our Reference 3 K=11 admixture. Spreadsheet with population level admixture results is here and participant results are here.

Here are some statistics for the standard error estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian 0 0.127 0.9848 0.7505 1.2216 1.6833
C2 Onge 0 0.2074 0.56 0.5404 0.8268 1.6914
C3 E Asian 0 0.2013 0.6123 0.6751 1.136 1.9961
C4 SW Asian 0 0.0874 1.1462 0.9246 1.5347 2.1008
C5 Euro 0 0.042 1.3034 0.9684 1.6582 2.3861
C6 Siberian 0 0.2054 0.6566 0.6712 1.0969 2.0099
C7 W African 0 0 0.01905 0.38847 0.75713 2.1588
C8 Papuan 0 0.1936 0.375 0.3648 0.5308 1.9627
C9 American 0 0.1461 0.3958 0.4646 0.6342 2.0831
C10 San/Pygmy 0 0 0.0708 0.2514 0.4471 2.0991
C11 E African 0 0 0.1235 0.3969 0.7315 1.9318

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

As the average error for the Onge component among South Asian populations is a little higher than 1%, the standard error on the ASI (Ancestral South Indian) computation here is about 1.4-1.5% just from admixture. The regression error is in addition to that.

And statistics for bias estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 -0.9069 -0.28408 -0.0349 -0.12196 0.01158 0.5856
C2 -0.7701 0 0.04005 0.03847 0.153 0.5703
C3 -0.5778 -0.0888 0.01645 0.02105 0.13737 0.6127
C4 -0.7701 -0.1657 0 -0.06692 0.01298 0.745
C5 -1.2917 -0.247675 0 -0.113631 0.008975 0.6763
C6 -0.7921 -0.0856 0.0129 0.009492 0.1198 0.6464
C7 -0.5745 0 0 -0.02173 0.0016 0.3426
C8 -0.1842 0.05328 0.13175 0.1377 0.21247 0.4712
C9 -0.4202 0.0096 0.0811 0.0915 0.1682 0.5129
C10 -0.4596 0 0.0002 0.003271 0.023425 0.3447
C11 -0.5766 0 0.0018 0.02276 0.05758 0.6346

You can also see the average value of the bias in each ancestral component for each population.

Reference I Admixture Errors

I am have thinking about error estimation for Admixture results for some time since I have heard a lot of arguments about how even 0.1% result is significant. I was skeptical of that and have rounded off my admixture run results to the nearest percent.

There was a memory leak issue in the bootstrapping code for admixture which crashed it every time I tried running it. I emailed David Alexander and he fixed it in version 1.12.

So I ran the default 200 bootstrap replicates to measure standard error in our old Reference I K=12 admixture. Spreadsheet with population level results is here and participant results are here.

Here are some statistics for the standard error estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian 0.00% 0.02% 0.33% 0.52% 0.96% 1.93%
C2 Blch/Cauc 0.00% 0.00% 1.02% 0.79% 1.45% 2.63%
C3 Kalash 0.00% 0.01% 0.40% 0.50% 0.99% 3.76%
C4 SE Asian 0.00% 0.09% 0.37% 0.60% 1.27% 1.92%
C5 SW Asian 0.00% 0.00% 0.60% 0.66% 1.28% 2.90%
C6 Euro 0.00% 0.00% 0.35% 0.56% 1.12% 1.82%
C7 Papuan 0.00% 0.07% 0.22% 0.23% 0.36% 1.08%
C8 NE Asian 0.00% 0.07% 0.36% 0.67% 1.36% 2.45%
C9 Siberian 0.00% 0.08% 0.37% 0.51% 0.82% 2.29%
C10 E Bantu 0.00% 0.00% 0.00% 0.35% 0.72% 1.93%
C11W Afr 0.00% 0.00% 0.00% 0.28% 0.50% 1.51%
C12 E Afr 0.00% 0.00% 0.05% 0.31% 0.60% 1.79%

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

And statistics for bias estimates:

Min. 1st Qu. Median Mean 3rd Qu. Max.
C1 S Asian -1.104% -0.031% 0.000% -0.024% 0.075% 1.026%
C2 Blch/Cauc -0.835% -0.280% -0.009% -0.133% 0.000% 1.049%
C3 Kalash -1.575% 0.000% 0.020% 0.076% 0.147% 0.615%
C4 SE Asian -0.629% -0.021% 0.011% 0.018% 0.087% 0.478%
C5 SW Asian -0.691% -0.094% 0.000% -0.020% 0.035% 0.613%
C6 Euro -0.572% -0.086% 0.000% -0.039% 0.004% 0.468%
C7 Papuan -0.171% 0.008% 0.059% 0.070% 0.120% 0.312%
C8 NE Asian -0.739% 0.000% 0.016% 0.034% 0.107% 0.679%
C9 Siberian -1.044% 0.000% 0.015% 0.035% 0.103% 0.692%
C10 E Bantu -0.412% 0.000% 0.000% -0.007% 0.001% 0.370%
C11 W Afr -0.261% 0.000% 0.000% 0.009% 0.005% 0.304%
C12 E Afr -0.635% 0.000% 0.000% -0.017% 0.010% 0.405%

You can also see the average value of the bias in each ancestral component for each population.

Since the bias is lower than the standard error and distributed around zero, if a large number of samples of a population group have some small percentage of an ancestral component, the likelihood of that not being noise is higher.