Reference 3 Fixed

I have fixed the problem with Reference 3 but if you notice any strange results, do let me know.

While the Reference 3 admixture results were generally good (and I have some nice surprises on the way I hope), the Reich et al populations had some weird behavior. From one K value to the next, their admixture would swing wildly especially among the minor components.

For example, for Chenchu, the 2nd component after South Asian was Southwest Asian (42%) at K=6, European (45%) at K=7 and American (32%) at K=8. That just didn't make any sense. It was similar for other Reich et al populations, but all the other reference populations seemed pretty stable.

The issue was that when I was creating Reference 3, I had to juggle lists of SNPs to figure out a way to include Reich et al with a large (>100,000) number of SNPs in the dataset since Reich doesn't have as many SNPs in common with the other datasets plus 23andme (v2 and v3) and FTDNA. In that effort where I was doing lots of SNP set intersections and unions I messed up. I used 217,000 SNPs. While these SNPs were present in all the other datasets, Reich et al had only 102,000 SNPs common with that set. Ouch! This was a royal mess as the high missing rate of Reich et al caused weird instability in its admixture results even though the rest of the results were mostly stable.

Now, I have pared down Reference 3 to 118,000 SNPs. These have a low missing rate in all the datasets. So I don't expect the same problems.

I am redoing the admixture runs with this new data and will have some of the results up soon.


  1. Reference 3 Admixture K=4 | Harappa Ancestry Project - pingback on April 18, 2011 at 9:41 am
  2. Reference 3 Admixture K=5 | Harappa Ancestry Project - pingback on April 18, 2011 at 6:07 pm

Trackbacks and Pingbacks: