Tag Archives: oracle

HarappaWorld Oracle Update

Posted by Zack on June 26, 2013 3 comments

I have updated HarappaWorld Oracle with the latest group averages.

You can download it and use the instructions here and here.

Also, please note the limitations of the Oracle.

HarappaOracle Limitations

Posted by Zack on June 12, 2012 10 comments

While HarappaOracle is a great tool, it has its limitations.

First of all, do not think of the mixed mode results as showing which populations you are descended from. Use HarappaOracle to get an idea of which populations are similar to you in their admixture results. This function is especially important since admixture results should be understood in relative terms, as I have been stressing.

Sometimes, for mixed-race people, the Oracle might sometimes provide a correct result like it does for me. For others, the known ancestral mix might not show up.

There is also the fact that the Oracle calculator is sensitive to your admixture percentages and sometimes small changes can change the Oracle mixed mode results radically.

Let's look at three siblings as an example. (My thanks to them for letting me use their results for this post.) Here are their admixture results:

	Sibling 1	Sibling 2	Sibling 3
NE Euro	43.8%	43.1%	43.9%
Mediterranean	27.6%	26.8%	27.0%
Baloch	11.2%	11.1%	12.2%
Caucasian	9.4%	10.8%	8.6%
S Indian	5.3%	6.5%	7.0%
SW Asian	1.5%	1.0%	0.5%
American	0.7%	0.3%	0.7%
NE Asian	0.4%	0.1%	0.1%
Beringian	0.0%	0.3%	0.0%
San	0.0%	0.1%	0.0%

Their admixture results are broadly similar, as expected. Some of you might think that 1% less or difference is very significant, but do consider what we know of DNA inheritance and the error margins in ADMIXTURE.

Now let's see their HarappaWorld Oracle results.

Sibling 1		Sibling 2		Sibling 3
romany	3.16	romany	2.85	romany	4.45
hungarian	9.47	hungarian	9.65	utahn-white	10.74
utahn-white	9.88	french	11.05	n-european	10.79
n-european	9.9	slovenian	11.09	hungarian	10.97
french	9.95	n-european	11.38	utahn-white	11.35
utahn-white	10.62	utahn-white	11.54	french	11.54
slovenian	10.95	utahn-white	12.21	british	11.94
british	11.29	british	12.99	slovenian	12.21
orcadian	13.17	orcadian	14.86	orcadian	13.51
ukranian	17.73	romanian	17.14	ukranian	18.24

Again, not unexpected. The top 10 population matches are not too different for the siblings. There are some differences, but nothing extraordinary.

Finally let's look at mixed mode Oracle, where we try to find the 10 closest matches (based on admixture results) assuming that these individuals are mixed from two populations.

Sibling 1		Sibling 2		Sibling 3
91.3% romany + 8.7% lithuanian	1.58	93.3% romany + 6.7% lithuanian	1.93	82.8% utahn-white + 17.2% bene-israel	1.99
78.4% romany + 21.6% n-european	1.67	95.4% romany + 4.6% finnish	1.99	83.7% n-european + 16.3% bene-israel	2.37
79.7% romany + 20.3% utahn-white	1.69	91.9% romany + 8.1% belorussian	2.01	83.8% utahn-white + 16.2% bene-israel	2.47
83.3% romany + 16.7% orcadian	1.76	92.4% romany + 7.6% russian	2.02	84.8% n-european + 15.2% cochin-jew	2.52
94.2% romany + 5.8% finnish	1.88	92.0% romany + 8.0% mordovian	2.06	86.0% n-european + 14.0% kerala-christian	2.93
79.8% romany + 20.2% utahn-white	1.99	90.4% romany + 9.6% ukranian	2.14	85.0% utahn-white + 15.0% cochin-jew	2.98
90.2% romany + 9.8% belorussian	2.00	88.7% romany + 11.3% slovenian	2.5	85.9% n-european + 14.1% ap-hyderabad	2.99
82.1% romany + 17.9% british	2.03	89.2% romany + 10.8% n-european	2.52	85.1% n-european + 14.9% up	3.02
91.4% romany + 8.6% russian	2.06	95.7% romany + 4.3% chuvash	2.58	85.5% n-european + 14.5% tn-brahmin	3.13
85.0% n-european + 15.0% bene-israel	2.16	90.7% romany + 9.3% utahn-white	2.58	85.5% n-european + 14.5% brahmin-tamil-nadu	3.15

Sibling 1 and Sibling 2 are again not too different from each other: Mostly Romany with some European. However, Sibling 3 is getting vastly different results. Why? No, Sibling 3 wasn't adopted! The reason is simple. Sibling 3 has more South Indian component than the average Romany in our dataset. This means that (s)he cannot be represented as a mix of Romany and a European ethnicity without a large error. Instead mostly northwest European and a little bit of Indian, especially Indian Jewish, seem to be closest to her results. However, this does not make her Jewish or Indian Jewish (who are quite mixed with the local Indian populations).

HarappaWorld on GEDmatch

Posted by Zack on May 21, 2012 2 comments

The HarappaWorld Admixture calculator is now available on GEDmatch.

You can compute:

Admixture Proportions
Admixture Proportions by Chromosome
Chromosome Painting
Paint differences between 2 kits, 1 chromosome
Paint differences between 2 kits, 22 chromosomes, reduced size

You do have to upload your genetic data to GEDmatch to use it.

If you are a Harappa participant and try GEDmatch too, please let me know if there's any difference between your admixture results.

UPDATE: Now you can even get your HarappaWorld Oracle results after getting the admixture results, thanks to John.

HarappaWorld Oracle

Posted by Zack on May 11, 2012 17 comments

Here's the HarappaWorld Oracle to go with the HarappaWorld admixture results and DIYHarappaWorld.

It works similar to the old Ref3 Harappa Oracle, with a couple of differences. One, there is no panasian switch since the Pan-Asian dataset is not included in this calculator.

I have added an optional mincount argument. It picks only those groups where the number of individuals is equal to or more than mincount for the Oracle calculation. By default mincount is 2, so only those groups which have 2 or more samples are used to compute your Oracle results.

Let's look at my top 20 Oracle results in mixed mode excluding population groups with less than 4 individuals.

HarappaOracle(c(26.46,36.82,14.22,4.78,0.00,1.32,0.86,0.04,0.19,0.06,3.63,8.07,0.00,2.44,0.43,0.67),k=20,mincount=4,mixedmode=T)

[,1] [,2]
[1,] "18.1% egyptian_behar_12 + 81.9% punjabi-arain_xing_25" "2.3361"
[2,] "18.1% egypt_henn2012_19 + 81.9% punjabi-arain_xing_25" "2.5615"
[3,] "80.7% punjabi-arain_xing_25 + 19.3% yemenese_behar_8" "2.8388"
[4,] "18.4% palestinian_hgdp_46 + 81.6% punjabi-arain_xing_25" "2.9944"
[5,] "84.7% punjabi-arain_xing_25 + 15.3% yemen-jew_behar_15" "3.0923"
[6,] "19.1% jordanian_behar_20 + 80.9% punjabi-arain_xing_25" "3.1877"
[7,] "18% egypt_henn2012_19 + 82% sindhi_hgdp_24" "3.4814"
[8,] "17.9% egyptian_behar_12 + 82.1% sindhi_hgdp_24" "3.5554"
[9,] "20.3% jordanian_behar_20 + 79.7% punjabi_harappa_7" "3.6161"
[10,] "18.9% egyptian_behar_12 + 81.1% punjabi_harappa_7" "3.6587"
[11,] "19.5% palestinian_hgdp_46 + 80.5% punjabi_harappa_7" "3.7079"
[12,] "19% egypt_henn2012_19 + 81% punjabi_harappa_7" "3.8303"
[13,] "18.3% palestinian_hgdp_46 + 81.7% sindhi_hgdp_24" "3.8762"
[14,] "80.4% punjabi-arain_xing_25 + 19.6% syrian_behar_16" "3.8908"
[15,] "19% lebanese_behar_7 + 81% punjabi-arain_xing_25" "4.0494"
[16,] "18.9% jordanian_behar_20 + 81.1% sindhi_hgdp_24" "4.078"
[17,] "79.9% punjabi_harappa_7 + 20.1% yemenese_behar_8" "4.1222"
[18,] "15.1% bedouin_hgdp_46 + 84.9% punjabi-arain_xing_25" "4.1522"
[19,] "85.3% punjabi-arain_xing_25 + 14.7% saudi_behar_20" "4.2014"
[20,] "79.1% punjabi_harappa_7 + 20.9% syrian_behar_16" "4.2191"

These results are closer to my actual reported ancestry than the ones from reference 3 oracle.

Harappa Oracle

Posted by Zack on March 23, 2012 15 comments

Based on the Dodecad Oracle, here is Harappa Oracle using reference 3 admixture results.

I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components. Because of that it is several times slower than DodecadOracle. I plan to offer an option soon to switch between Euclidean distance and Fst-weighted distance.

You need to install R to use it. Then unzip the Oracle zip file. Double-click on the file or use the following in R:

load('HarappaOracleR3fst.RData')

In R, you can look at the 385 populations included by typing:

X[,1]

To use it to find your closest populations, you need your Harappa Reference 3 admixture results. Use them separated by commas like this (for me):

HarappaOracle(c(44,12,0,24,14,1,2,0,0,1,2))

You will get a result, with the first column showing the closest populations and the 2nd column their distance to you.

[,1] [,2]
[1,] "balochi" "8.0242"
[2,] "bene-israel" "9.2843"
[3,] "brahui" "9.5158"
[4,] "pathan" "9.7034"
[5,] "makrani" "10.1014"
[6,] "sindhi" "10.9236"
[7,] "Bhatia" "11.8441"
[8,] "Sindhi" "12.1704"
[9,] "Kashmiri" "13.4229"
[10,] "punjabi-arain" "13.9192"

You can also find out the closest populations to one of the reference populations:

HarappaOracle("punjabi-arain")

By default, the Oracle shows the 10 closest populations. You can change that:

HarappaOracle("punjabi-arain",k=20)

Also, by default, the Oracle excludes the Pan-Asian dataset since the overlap is only 5,400 SNPs. You can include Pan-Asian populations:

HarappaOracle("punjabi-arain",panasian=T)

There is also a mixed mode where the individual (or mean reference population) is compared against all pairs of populations as ancestors.

HarappaOracle("Haryana Jatt",mixedmode=T)

which has the following output:

[1,] "Haryana Jatt" "0"
[2,] "15.4% lithuanians + 84.6% Punjabi Brahmin" "1.9553"
[3,] "10.6% russian + 89.4% Rajasthani Brahmin" "2.0626"
[4,] "14.7% finnish + 85.3% Punjabi Brahmin" "2.0863"
[5,] "9.2% finnish + 90.8% Rajasthani Brahmin" "2.1142"
[6,] "89.4% Rajasthani Brahmin + 10.6% mordovians" "2.1727"
[7,] "9.6% lithuanians + 90.4% Rajasthani Brahmin" "2.1989"
[8,] "10.1% belorussian + 89.9% Rajasthani Brahmin" "2.2938"
[9,] "16.8% russian + 83.2% Punjabi Brahmin" "2.3015"
[10,] "16.2% belorussian + 83.8% Punjabi Brahmin" "2.3656"

You can of course combine any or all of the options.

Think of Harappa Oracle as a tool to help you interpret your admixture results by comparing who you are closest to. Do not think of it as giving you your real ancestry.

Harappa Ancestry Project

Genetics and South Asia

Tag Archives: oracle

HarappaWorld Oracle Update

HarappaOracle Limitations

HarappaWorld on GEDmatch

HarappaWorld Oracle

Harappa Oracle

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Tag Archives: oracle

HarappaWorld Oracle Update

Share this:

HarappaOracle Limitations

Share this:

HarappaWorld on GEDmatch

Share this:

HarappaWorld Oracle

Share this:

Harappa Oracle

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll