HarappaOracle Limitations

While HarappaOracle is a great tool, it has its limitations.

First of all, do not think of the mixed mode results as showing which populations you are descended from. Use HarappaOracle to get an idea of which populations are similar to you in their admixture results. This function is especially important since admixture results should be understood in relative terms, as I have been stressing.

Sometimes, for mixed-race people, the Oracle might sometimes provide a correct result like it does for me. For others, the known ancestral mix might not show up.

There is also the fact that the Oracle calculator is sensitive to your admixture percentages and sometimes small changes can change the Oracle mixed mode results radically.

Let's look at three siblings as an example. (My thanks to them for letting me use their results for this post.) Here are their admixture results:

Sibling 1 Sibling 2 Sibling 3
NE Euro 43.8% 43.1% 43.9%
Mediterranean 27.6% 26.8% 27.0%
Baloch 11.2% 11.1% 12.2%
Caucasian 9.4% 10.8% 8.6%
S Indian 5.3% 6.5% 7.0%
SW Asian 1.5% 1.0% 0.5%
American 0.7% 0.3% 0.7%
NE Asian 0.4% 0.1% 0.1%
Beringian 0.0% 0.3% 0.0%
San 0.0% 0.1% 0.0%

Their admixture results are broadly similar, as expected. Some of you might think that 1% less or difference is very significant, but do consider what we know of DNA inheritance and the error margins in ADMIXTURE.

Now let's see their HarappaWorld Oracle results.

Sibling 1 Sibling 2 Sibling 3
romany 3.16 romany 2.85 romany 4.45
hungarian 9.47 hungarian 9.65 utahn-white 10.74
utahn-white 9.88 french 11.05 n-european 10.79
n-european 9.9 slovenian 11.09 hungarian 10.97
french 9.95 n-european 11.38 utahn-white 11.35
utahn-white 10.62 utahn-white 11.54 french 11.54
slovenian 10.95 utahn-white 12.21 british 11.94
british 11.29 british 12.99 slovenian 12.21
orcadian 13.17 orcadian 14.86 orcadian 13.51
ukranian 17.73 romanian 17.14 ukranian 18.24

Again, not unexpected. The top 10 population matches are not too different for the siblings. There are some differences, but nothing extraordinary.

Finally let's look at mixed mode Oracle, where we try to find the 10 closest matches (based on admixture results) assuming that these individuals are mixed from two populations.

Sibling 1 Sibling 2 Sibling 3
91.3% romany + 8.7% lithuanian 1.58 93.3% romany + 6.7% lithuanian 1.93 82.8% utahn-white + 17.2% bene-israel 1.99
78.4% romany + 21.6% n-european 1.67 95.4% romany + 4.6% finnish 1.99 83.7% n-european + 16.3% bene-israel 2.37
79.7% romany + 20.3% utahn-white 1.69 91.9% romany + 8.1% belorussian 2.01 83.8% utahn-white + 16.2% bene-israel 2.47
83.3% romany + 16.7% orcadian 1.76 92.4% romany + 7.6% russian 2.02 84.8% n-european + 15.2% cochin-jew 2.52
94.2% romany + 5.8% finnish 1.88 92.0% romany + 8.0% mordovian 2.06 86.0% n-european + 14.0% kerala-christian 2.93
79.8% romany + 20.2% utahn-white 1.99 90.4% romany + 9.6% ukranian 2.14 85.0% utahn-white + 15.0% cochin-jew 2.98
90.2% romany + 9.8% belorussian 2.00 88.7% romany + 11.3% slovenian 2.5 85.9% n-european + 14.1% ap-hyderabad 2.99
82.1% romany + 17.9% british 2.03 89.2% romany + 10.8% n-european 2.52 85.1% n-european + 14.9% up 3.02
91.4% romany + 8.6% russian 2.06 95.7% romany + 4.3% chuvash 2.58 85.5% n-european + 14.5% tn-brahmin 3.13
85.0% n-european + 15.0% bene-israel 2.16 90.7% romany + 9.3% utahn-white 2.58 85.5% n-european + 14.5% brahmin-tamil-nadu 3.15

Sibling 1 and Sibling 2 are again not too different from each other: Mostly Romany with some European. However, Sibling 3 is getting vastly different results. Why? No, Sibling 3 wasn't adopted! The reason is simple. Sibling 3 has more South Indian component than the average Romany in our dataset. This means that (s)he cannot be represented as a mix of Romany and a European ethnicity without a large error. Instead mostly northwest European and a little bit of Indian, especially Indian Jewish, seem to be closest to her results. However, this does not make her Jewish or Indian Jewish (who are quite mixed with the local Indian populations).

10 Comments.

  1. The new Run doesnt work that well for me, large genetic distances in mixed mode. The first/old run and oracle was better for me. I reached a genetic distance of 1.06

  2. Zack, so we can get real ancestry components even if not in everyones case but when there is ample certainity that the result is for real (eg. You) then why not just calculate the age of the components? The age of the components (if true) is very vital.

    • I think there is a major disconnect in your understanding of how this works. When you say "Age", what do you mean? How long ago the component first appeared? This is not possible from admixture tests with samples from existing populations alone.
      What admixture is doing is comparing your genome with everybody else in the dataset, and binning it through a statistical process. Some alleles are modal(most frequent) in certain populations, and so if one part of your genome is a match for those alleles, then that section of your chromosome is similar to the modal population.
      So lets say in Chromosome 19, between counts 1 and 10, most Baloch have AAAAAAAAAA, and your chromosome is also exactly AAAAAAAAAA there, then in that region, your component would be assigned to Baloch/Gedrosia/Indus/West-Central Asian or whatever you want to call it.
      How are you going to calculate the components age from this?

  3. I don't know but i think Dienekes have done it once and also Metspalu et al. on ANI components least age which was 12500YBP but probably not in individual manner! I also don't know whether the genome bloggers can do that from the sources they have! But i know this the components age is vital to match your DNA to any historical event either proposed or recorded! otherwise the glass is to remain half full.

    • Metspalu et al used linkage disequilibrium for the purpose. Their estimate is very different from Moorjani et al.

      Dienekes calculated the age of ADMIXTURE components using effective population estimates (in a post that I am unable to locate right now). While I can easily repeat Dienekes' analysis for HarappaWorld components, I am not sure that is accurate or useful.

      I am thinking about age estimates but still don't have a good solution.

      • Zack let me tell you what just do it! Lets just see what kind of results we get! Things are always there to improve aren't they?, I mean it don't need to be 100% correct but the only way to get something true is by trying again and again with different paths... But the goal is 1 the truth.

  4. What is "Utahn-White"? Also while I'm at it, what is "CEU_V" and "CEU30"? I've searched all over and can't find the answer. Am I the only one who doesn't know this??