HarappaOracle Limitations

Posted by Zack on June 12, 2012

While HarappaOracle is a great tool, it has its limitations.

First of all, do not think of the mixed mode results as showing which populations you are descended from. Use HarappaOracle to get an idea of which populations are similar to you in their admixture results. This function is especially important since admixture results should be understood in relative terms, as I have been stressing.

Sometimes, for mixed-race people, the Oracle might sometimes provide a correct result like it does for me. For others, the known ancestral mix might not show up.

There is also the fact that the Oracle calculator is sensitive to your admixture percentages and sometimes small changes can change the Oracle mixed mode results radically.

Let's look at three siblings as an example. (My thanks to them for letting me use their results for this post.) Here are their admixture results:

	Sibling 1	Sibling 2	Sibling 3
NE Euro	43.8%	43.1%	43.9%
Mediterranean	27.6%	26.8%	27.0%
Baloch	11.2%	11.1%	12.2%
Caucasian	9.4%	10.8%	8.6%
S Indian	5.3%	6.5%	7.0%
SW Asian	1.5%	1.0%	0.5%
American	0.7%	0.3%	0.7%
NE Asian	0.4%	0.1%	0.1%
Beringian	0.0%	0.3%	0.0%
San	0.0%	0.1%	0.0%

Their admixture results are broadly similar, as expected. Some of you might think that 1% less or difference is very significant, but do consider what we know of DNA inheritance and the error margins in ADMIXTURE.

Now let's see their HarappaWorld Oracle results.

Sibling 1		Sibling 2		Sibling 3
romany	3.16	romany	2.85	romany	4.45
hungarian	9.47	hungarian	9.65	utahn-white	10.74
utahn-white	9.88	french	11.05	n-european	10.79
n-european	9.9	slovenian	11.09	hungarian	10.97
french	9.95	n-european	11.38	utahn-white	11.35
utahn-white	10.62	utahn-white	11.54	french	11.54
slovenian	10.95	utahn-white	12.21	british	11.94
british	11.29	british	12.99	slovenian	12.21
orcadian	13.17	orcadian	14.86	orcadian	13.51
ukranian	17.73	romanian	17.14	ukranian	18.24

Again, not unexpected. The top 10 population matches are not too different for the siblings. There are some differences, but nothing extraordinary.

Finally let's look at mixed mode Oracle, where we try to find the 10 closest matches (based on admixture results) assuming that these individuals are mixed from two populations.

Sibling 1		Sibling 2		Sibling 3
91.3% romany + 8.7% lithuanian	1.58	93.3% romany + 6.7% lithuanian	1.93	82.8% utahn-white + 17.2% bene-israel	1.99
78.4% romany + 21.6% n-european	1.67	95.4% romany + 4.6% finnish	1.99	83.7% n-european + 16.3% bene-israel	2.37
79.7% romany + 20.3% utahn-white	1.69	91.9% romany + 8.1% belorussian	2.01	83.8% utahn-white + 16.2% bene-israel	2.47
83.3% romany + 16.7% orcadian	1.76	92.4% romany + 7.6% russian	2.02	84.8% n-european + 15.2% cochin-jew	2.52
94.2% romany + 5.8% finnish	1.88	92.0% romany + 8.0% mordovian	2.06	86.0% n-european + 14.0% kerala-christian	2.93
79.8% romany + 20.2% utahn-white	1.99	90.4% romany + 9.6% ukranian	2.14	85.0% utahn-white + 15.0% cochin-jew	2.98
90.2% romany + 9.8% belorussian	2.00	88.7% romany + 11.3% slovenian	2.5	85.9% n-european + 14.1% ap-hyderabad	2.99
82.1% romany + 17.9% british	2.03	89.2% romany + 10.8% n-european	2.52	85.1% n-european + 14.9% up	3.02
91.4% romany + 8.6% russian	2.06	95.7% romany + 4.3% chuvash	2.58	85.5% n-european + 14.5% tn-brahmin	3.13
85.0% n-european + 15.0% bene-israel	2.16	90.7% romany + 9.3% utahn-white	2.58	85.5% n-european + 14.5% brahmin-tamil-nadu	3.15

Sibling 1 and Sibling 2 are again not too different from each other: Mostly Romany with some European. However, Sibling 3 is getting vastly different results. Why? No, Sibling 3 wasn't adopted! The reason is simple. Sibling 3 has more South Indian component than the average Romany in our dataset. This means that (s)he cannot be represented as a mix of Romany and a European ethnicity without a large error. Instead mostly northwest European and a little bit of Indian, especially Indian Jewish, seem to be closest to her results. However, this does not make her Jewish or Indian Jewish (who are quite mixed with the local Indian populations).

Admixtureharappaworld, oracle

← HarappaWorld HRP0240-HRP0244

HarappaWorld Ancestral South Indian →

10 Comments.

HRP15 June 12, 2012 at 2:53 pm

The new Run doesnt work that well for me, large genetic distances in mixed mode. The first/old run and oracle was better for me. I reached a genetic distance of 1.06
- Zack June 21, 2012 at 3:15 pm
  
  Small distances do not mean it worked better.
Nirjhar June 12, 2012 at 11:48 pm

Zack, so we can get real ancestry components even if not in everyones case but when there is ample certainity that the result is for real (eg. You) then why not just calculate the age of the components? The age of the components (if true) is very vital.
- HRP142 June 13, 2012 at 12:02 pm
  
  I think there is a major disconnect in your understanding of how this works. When you say "Age", what do you mean? How long ago the component first appeared? This is not possible from admixture tests with samples from existing populations alone.
  What admixture is doing is comparing your genome with everybody else in the dataset, and binning it through a statistical process. Some alleles are modal(most frequent) in certain populations, and so if one part of your genome is a match for those alleles, then that section of your chromosome is similar to the modal population.
  So lets say in Chromosome 19, between counts 1 and 10, most Baloch have AAAAAAAAAA, and your chromosome is also exactly AAAAAAAAAA there, then in that region, your component would be assigned to Baloch/Gedrosia/Indus/West-Central Asian or whatever you want to call it.
  How are you going to calculate the components age from this?
Nirjhar June 13, 2012 at 10:18 pm

I don't know but i think Dienekes have done it once and also Metspalu et al. on ANI components least age which was 12500YBP but probably not in individual manner! I also don't know whether the genome bloggers can do that from the sources they have! But i know this the components age is vital to match your DNA to any historical event either proposed or recorded! otherwise the glass is to remain half full.
- Zack June 21, 2012 at 3:14 pm
  
  Metspalu et al used linkage disequilibrium for the purpose. Their estimate is very different from Moorjani et al.
  
  Dienekes calculated the age of ADMIXTURE components using effective population estimates (in a post that I am unable to locate right now). While I can easily repeat Dienekes' analysis for HarappaWorld components, I am not sure that is accurate or useful.
  
  I am thinking about age estimates but still don't have a good solution.
  - Nirjhar June 21, 2012 at 10:44 pm
    
    Zack let me tell you what just do it! Lets just see what kind of results we get! Things are always there to improve aren't they?, I mean it don't need to be 100% correct but the only way to get something true is by trying again and again with different paths... But the goal is 1 the truth.
GA October 3, 2013 at 5:39 pm

What is "Utahn-White"? Also while I'm at it, what is "CEU_V" and "CEU30"? I've searched all over and can't find the answer. Am I the only one who doesn't know this??
- SB October 3, 2013 at 8:22 pm
  
  http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?pop=1409
  - Christine October 4, 2013 at 11:44 pm
    
    Thank you for asking, I was just trying to figure that out, too. And thanks for the link, SB

Harappa Ancestry Project

Genetics and South Asia