Tag Archives: harappaworld

HarappaWorld HRP0289-HRP0297

I have added the HarappaWorld Admixture results for HRP0289-HRP0297 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I also updated the results for HRP0274 using FTDNA Family Finder data instead of the Genographic 2.0 data that was originally submitted. As the Geno2 data has only 14,000 SNPs in common with my HarappaWorld calculator, it's interesting to see HRP0274's admxiture results change:

Component Geno2 FTDNA
South Indian 48.68% 46.00%
Baloch 34.22% 32.99%
Caucasian 4.33% 5.02%
Northeast Euro 3.89% 3.57%
Southeast Asian 2.75% 1.06%
Siberian 1.25% 1.87%
Northeast Asian 1.16% 1.69%
Papuan 1.14% 1.85%
American 0.87% 1.23%
Beringian 0.01% 1.23%
Mediterranean 0.00% 0.39%
Southwest Asian 1.69% 3.10%
San 0.00% 0.00%
East African 0.00% 0.00%
Pygmy 0.00% 0.00%
West African 0.00% 0.00%

The only differences greater than 1% are South Indian (2.68%), Southeast Asian (1.69%), Southwest Asian (1.41%), Baloch (1.23%), and Beringian (1.22%). It's remarkable that only 14,000 SNPs could provide us a decent result.

We have two new Gujarati participants. HRP0292, a Gujarati Jain, seems to be more similar to somewhat southern populations. HRP0294, a Gujarati Sunni Vohra, has results somewhat similar to HRP0265 (Gujarati Patel Muslim) and more north-oriented. Therefore, I have separated a new ethnic category of Gujarati Muslims in my ethnic spreadsheet. I'll have averages when I compute them next time.

We have two Indian adoptee participants as well. HRP0297 has results which match well with the Bengalis (other than the Brahmins) in this project. HRP0290's results are somewhat harder to figure out. The closest groups, not too close, are probably Tharu from Uttarakhand and Satnami from Chhattisgarh (Reich et al dataset). A ChromoPainter analysis would be more useful here.

Related Reading:

From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Ancient Cities of the Indus Valley Civilization
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World

HarappaWorld HRP0284-HRP0288

I have added the HarappaWorld Admixture results for HRP0284-HRP0288 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages (weighted) spreadsheet.

Related Reading:

From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
The Harappa Files

HarappaWorld HRP0273-HRP0283

I have added the HarappaWorld Admixture results for HRP0273-HRP0283 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

We got our first Pashtun participants, one Afghan and one Pakistani. Both have very similar results and are not much different than the HGDP Pathan sample average in their South Indian component.

HRP0278, a Bengali (mostly), is more East Asian components than any other Bengali participants (including my friend Razib.)

Related Reading:

The Pashtun Fixer
Pashto-English/ English-Pashto Dictionary & Phrasebook (Hippocrene Dictionary & Phrasebooks)
The Valley's Edge: A Year with the Pashtuns in the Heartland of the Taliban
Pashtun Tales: From the Pakistan-Afghan Frontier

HarappaWorld HRP0253-HRP0272

I have added the HarappaWorld Admixture results for HRP0253-HRP0272 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

The Harappa Files
Harappa: The Cradle of Our Civilization
From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)

HarappaWorld HRP0250-HRP0252

I have added the HarappaWorld Admixture results for HRP0250-HRP0252 to the individual spreadsheet.

However, I have not recomputed the weighted averages for the Kashmiris or Bengali Brahmins. Also, I am not sure about Tamil Gounder. Wikipedia says they are Vellalars, but I don't know if I should report separate Gounder results or include in the Tamil Vellalar average.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

The Harappa Files
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts

HarappaWorld HRP0245-HRP0249

I have added the HarappaWorld Admixture results for HRP0245-HRP0249 to the individual spreadsheet.

I have also recomputed the weighted averages for Kurds (from 6 to 10 now).

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Let's look at the Kurdish results from Yunusbayev (prefix: kurd), Xing (prefix: F) and Harappa (prefix: HRP). Do note that the Xing results were computed with a smaller number of SNPs and thus might be noisy.

Related Reading:

Harappa: The Cradle of Our Civilization

Pagani East African Dataset

Pagani et al analyzed Ethiopian genetics in their paper "Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool". Their dataset consisting of Ethiopians and a few other East African populations is available online.

I have analyzed the Pagani dataset with my HarappaWorld admixture calculator and included the results in my regular spreadsheet.

The group (weighted mean) results are also shown in the usual interactive bar chart below. You can click on the component labels to sort by that ancestral component.

Because the East African component as computed in HarappaWorld is maximum among the Maasai and several of the Pagani dataset populations have a higher percentage of that component, we should be a bit careful with interpreting the HarappaWorld results for the Pagani groups. I'll likely include them in my next iteration of the admixture calculator.

Related Reading:

Somalis in Maine: Crossing Cultural Currents (Io Series)
Sudan, 3rd (Bradt Travel Guide Sudan)
Ethiopia - Culture Smart!: The essential guide to customs & culture
African History in Documents: Eastern African History (African History Text and Readings, Vol 2)
Lonely Planet Ethiopia Djibouti & Somaliland (Travel Guide)

HarappaWorld Ancestral South Indian

Using the same method as I used for reference 3 admixture, I decided to guesstimate the Ancestral South Indian proportions, as given by Reich et al, for my HarappaWorld admixture run.

Basically, I used the 92 (out of the 96 samples Reich et al used) to find population averages for the South Indian component. Then, I used linear regression between the South Indian component average and Reich et al's estimate of Ancestral South Indian (ASI) ancestry. Since Reich et al actually list Ancestral North Indian percentages in their paper but their model is a two-ancestry ANI+ASI one, I simply calculated the ASI percentages as 100% minus ANI.

The correlation between Reich et al ASI and my HarappaWorld South Indian component for the relevant populations turns out to be 0.99277086.

And the linear regression fit for the data is:

ASI = 2.5218942 + 0.8104836 * S_INDIAN

where both ASI (Reich et al) and S_INDIAN (HarappaWorld) are given in percentages.

Of the individuals in HarappaWorld, I kept only those who had a South Indian component of at least 20% for computing the ASI proportions.

The resulting ASI percentages can be seen in a spreadsheet.

Please note that in the Group sheet, the averages are based on the samples which met the 20% South Indian component threshold. Thus, the 20% ASI in the Romanians is the average of the two Romanians who met the threshold out of a total of 16 Romanian samples.

The individual results are available in the Individual sheet. These results are a little different from the estimates using reference 3. Thus, I would point out that these should be taken only as a rough estimate.

Related Reading:

Ready Player One: A Novel
Overdressed: The Shockingly High Cost of Cheap Fashion
Parenting With Love And Logic (Updated and Expanded Edition)
South Southeast
Southeast Asia: Lonely Planet Phrasebook

HarappaOracle Limitations

While HarappaOracle is a great tool, it has its limitations.

First of all, do not think of the mixed mode results as showing which populations you are descended from. Use HarappaOracle to get an idea of which populations are similar to you in their admixture results. This function is especially important since admixture results should be understood in relative terms, as I have been stressing.

Sometimes, for mixed-race people, the Oracle might sometimes provide a correct result like it does for me. For others, the known ancestral mix might not show up.

There is also the fact that the Oracle calculator is sensitive to your admixture percentages and sometimes small changes can change the Oracle mixed mode results radically.

Let's look at three siblings as an example. (My thanks to them for letting me use their results for this post.) Here are their admixture results:

Sibling 1 Sibling 2 Sibling 3
NE Euro 43.8% 43.1% 43.9%
Mediterranean 27.6% 26.8% 27.0%
Baloch 11.2% 11.1% 12.2%
Caucasian 9.4% 10.8% 8.6%
S Indian 5.3% 6.5% 7.0%
SW Asian 1.5% 1.0% 0.5%
American 0.7% 0.3% 0.7%
NE Asian 0.4% 0.1% 0.1%
Beringian 0.0% 0.3% 0.0%
San 0.0% 0.1% 0.0%

Their admixture results are broadly similar, as expected. Some of you might think that 1% less or difference is very significant, but do consider what we know of DNA inheritance and the error margins in ADMIXTURE.

Now let's see their HarappaWorld Oracle results.

Sibling 1 Sibling 2 Sibling 3
romany 3.16 romany 2.85 romany 4.45
hungarian 9.47 hungarian 9.65 utahn-white 10.74
utahn-white 9.88 french 11.05 n-european 10.79
n-european 9.9 slovenian 11.09 hungarian 10.97
french 9.95 n-european 11.38 utahn-white 11.35
utahn-white 10.62 utahn-white 11.54 french 11.54
slovenian 10.95 utahn-white 12.21 british 11.94
british 11.29 british 12.99 slovenian 12.21
orcadian 13.17 orcadian 14.86 orcadian 13.51
ukranian 17.73 romanian 17.14 ukranian 18.24

Again, not unexpected. The top 10 population matches are not too different for the siblings. There are some differences, but nothing extraordinary.

Finally let's look at mixed mode Oracle, where we try to find the 10 closest matches (based on admixture results) assuming that these individuals are mixed from two populations.

Sibling 1 Sibling 2 Sibling 3
91.3% romany + 8.7% lithuanian 1.58 93.3% romany + 6.7% lithuanian 1.93 82.8% utahn-white + 17.2% bene-israel 1.99
78.4% romany + 21.6% n-european 1.67 95.4% romany + 4.6% finnish 1.99 83.7% n-european + 16.3% bene-israel 2.37
79.7% romany + 20.3% utahn-white 1.69 91.9% romany + 8.1% belorussian 2.01 83.8% utahn-white + 16.2% bene-israel 2.47
83.3% romany + 16.7% orcadian 1.76 92.4% romany + 7.6% russian 2.02 84.8% n-european + 15.2% cochin-jew 2.52
94.2% romany + 5.8% finnish 1.88 92.0% romany + 8.0% mordovian 2.06 86.0% n-european + 14.0% kerala-christian 2.93
79.8% romany + 20.2% utahn-white 1.99 90.4% romany + 9.6% ukranian 2.14 85.0% utahn-white + 15.0% cochin-jew 2.98
90.2% romany + 9.8% belorussian 2.00 88.7% romany + 11.3% slovenian 2.5 85.9% n-european + 14.1% ap-hyderabad 2.99
82.1% romany + 17.9% british 2.03 89.2% romany + 10.8% n-european 2.52 85.1% n-european + 14.9% up 3.02
91.4% romany + 8.6% russian 2.06 95.7% romany + 4.3% chuvash 2.58 85.5% n-european + 14.5% tn-brahmin 3.13
85.0% n-european + 15.0% bene-israel 2.16 90.7% romany + 9.3% utahn-white 2.58 85.5% n-european + 14.5% brahmin-tamil-nadu 3.15

Sibling 1 and Sibling 2 are again not too different from each other: Mostly Romany with some European. However, Sibling 3 is getting vastly different results. Why? No, Sibling 3 wasn't adopted! The reason is simple. Sibling 3 has more South Indian component than the average Romany in our dataset. This means that (s)he cannot be represented as a mix of Romany and a European ethnicity without a large error. Instead mostly northwest European and a little bit of Indian, especially Indian Jewish, seem to be closest to her results. However, this does not make her Jewish or Indian Jewish (who are quite mixed with the local Indian populations).

Related Reading:

Oracle Core: Essential Internals for DBAs and Developers
Oracle PL/SQL Programming: Covers Versions Through Oracle Database 11g Release 2 (Animal Guide)
Oracle SQL By Example (4th Edition)
Oracle SQL Tuning with Oracle SQLTXPLAIN
Oracle Essentials: Oracle Database 11g

HarappaWorld HRP0240-HRP0244

From now on, instead of waiting till I have a batch of 10 new participants to compute their Admixture results, I'll run admixture at the start of the month for those who submitted their data during the previous month.

So I have added the HarappaWorld Admixture results for HRP0240-HRP0244 to the individual spreadsheet.

I have also recomputed the weighted averages for Bengalis (from 3 to 5 now), Kerala Muslims (from 1 to 2), and Georgians (from 3 to 4) while adding a new one for our first North Ossetian participant.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World
Harappa: The Cradle of Our Civilization