Tag Archives: harappa

HarappaWorld HRP0289-HRP0297

I have added the HarappaWorld Admixture results for HRP0289-HRP0297 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I also updated the results for HRP0274 using FTDNA Family Finder data instead of the Genographic 2.0 data that was originally submitted. As the Geno2 data has only 14,000 SNPs in common with my HarappaWorld calculator, it's interesting to see HRP0274's admxiture results change:

Component Geno2 FTDNA
South Indian 48.68% 46.00%
Baloch 34.22% 32.99%
Caucasian 4.33% 5.02%
Northeast Euro 3.89% 3.57%
Southeast Asian 2.75% 1.06%
Siberian 1.25% 1.87%
Northeast Asian 1.16% 1.69%
Papuan 1.14% 1.85%
American 0.87% 1.23%
Beringian 0.01% 1.23%
Mediterranean 0.00% 0.39%
Southwest Asian 1.69% 3.10%
San 0.00% 0.00%
East African 0.00% 0.00%
Pygmy 0.00% 0.00%
West African 0.00% 0.00%

The only differences greater than 1% are South Indian (2.68%), Southeast Asian (1.69%), Southwest Asian (1.41%), Baloch (1.23%), and Beringian (1.22%). It's remarkable that only 14,000 SNPs could provide us a decent result.

We have two new Gujarati participants. HRP0292, a Gujarati Jain, seems to be more similar to somewhat southern populations. HRP0294, a Gujarati Sunni Vohra, has results somewhat similar to HRP0265 (Gujarati Patel Muslim) and more north-oriented. Therefore, I have separated a new ethnic category of Gujarati Muslims in my ethnic spreadsheet. I'll have averages when I compute them next time.

We have two Indian adoptee participants as well. HRP0297 has results which match well with the Bengalis (other than the Brahmins) in this project. HRP0290's results are somewhat harder to figure out. The closest groups, not too close, are probably Tharu from Uttarakhand and Satnami from Chhattisgarh (Reich et al dataset). A ChromoPainter analysis would be more useful here.

Related Reading:

Ancient Cities of the Indus Valley Civilization
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World
The Harappa Files

HarappaWorld HRP0284-HRP0288

I have added the HarappaWorld Admixture results for HRP0284-HRP0288 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages (weighted) spreadsheet.

Related Reading:

From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Ancient Cities of the Indus Valley Civilization

HarappaWorld HRP0273-HRP0283

I have added the HarappaWorld Admixture results for HRP0273-HRP0283 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

We got our first Pashtun participants, one Afghan and one Pakistani. Both have very similar results and are not much different than the HGDP Pathan sample average in their South Indian component.

HRP0278, a Bengali (mostly), is more East Asian components than any other Bengali participants (including my friend Razib.)

Related Reading:

Harappa: The Cradle of Our Civilization
Pashto-English/ English-Pashto Dictionary & Phrasebook (Hippocrene Dictionary & Phrasebooks)
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Pashtun

HarappaWorld HRP0253-HRP0272

I have added the HarappaWorld Admixture results for HRP0253-HRP0272 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

The Harappa Files
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World

HarappaWorld HRP0250-HRP0252

I have added the HarappaWorld Admixture results for HRP0250-HRP0252 to the individual spreadsheet.

However, I have not recomputed the weighted averages for the Kashmiris or Bengali Brahmins. Also, I am not sure about Tamil Gounder. Wikipedia says they are Vellalars, but I don't know if I should report separate Gounder results or include in the Tamil Vellalar average.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

The Harappa Files
From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)

HarappaWorld HRP0245-HRP0249

I have added the HarappaWorld Admixture results for HRP0245-HRP0249 to the individual spreadsheet.

I have also recomputed the weighted averages for Kurds (from 6 to 10 now).

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Let's look at the Kurdish results from Yunusbayev (prefix: kurd), Xing (prefix: F) and Harappa (prefix: HRP). Do note that the Xing results were computed with a smaller number of SNPs and thus might be noisy.

Related Reading:

Harappa: The Cradle of Our Civilization
Ancient Cities of the Indus Valley Civilization
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World

HarappaWorld HRP0240-HRP0244

From now on, instead of waiting till I have a batch of 10 new participants to compute their Admixture results, I'll run admixture at the start of the month for those who submitted their data during the previous month.

So I have added the HarappaWorld Admixture results for HRP0240-HRP0244 to the individual spreadsheet.

I have also recomputed the weighted averages for Bengalis (from 3 to 5 now), Kerala Muslims (from 1 to 2), and Georgians (from 3 to 4) while adding a new one for our first North Ossetian participant.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Related Reading:

The Harappa Files
Ancient Cities of the Indus Valley Civilization
Harappa: The Cradle of Our Civilization
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World

HarappaWorld Oracle

Here's the HarappaWorld Oracle to go with the HarappaWorld admixture results and DIYHarappaWorld.

It works similar to the old Ref3 Harappa Oracle, with a couple of differences. One, there is no panasian switch since the Pan-Asian dataset is not included in this calculator.

I have added an optional mincount argument. It picks only those groups where the number of individuals is equal to or more than mincount for the Oracle calculation. By default mincount is 2, so only those groups which have 2 or more samples are used to compute your Oracle results.

Let's look at my top 20 Oracle results in mixed mode excluding population groups with less than 4 individuals.

HarappaOracle(c(26.46,36.82,14.22,4.78,0.00,1.32,0.86,0.04,0.19,0.06,3.63,8.07,0.00,2.44,0.43,0.67),k=20,mincount=4,mixedmode=T)

[,1] [,2]
[1,] "18.1% egyptian_behar_12 + 81.9% punjabi-arain_xing_25" "2.3361"
[2,] "18.1% egypt_henn2012_19 + 81.9% punjabi-arain_xing_25" "2.5615"
[3,] "80.7% punjabi-arain_xing_25 + 19.3% yemenese_behar_8" "2.8388"
[4,] "18.4% palestinian_hgdp_46 + 81.6% punjabi-arain_xing_25" "2.9944"
[5,] "84.7% punjabi-arain_xing_25 + 15.3% yemen-jew_behar_15" "3.0923"
[6,] "19.1% jordanian_behar_20 + 80.9% punjabi-arain_xing_25" "3.1877"
[7,] "18% egypt_henn2012_19 + 82% sindhi_hgdp_24" "3.4814"
[8,] "17.9% egyptian_behar_12 + 82.1% sindhi_hgdp_24" "3.5554"
[9,] "20.3% jordanian_behar_20 + 79.7% punjabi_harappa_7" "3.6161"
[10,] "18.9% egyptian_behar_12 + 81.1% punjabi_harappa_7" "3.6587"
[11,] "19.5% palestinian_hgdp_46 + 80.5% punjabi_harappa_7" "3.7079"
[12,] "19% egypt_henn2012_19 + 81% punjabi_harappa_7" "3.8303"
[13,] "18.3% palestinian_hgdp_46 + 81.7% sindhi_hgdp_24" "3.8762"
[14,] "80.4% punjabi-arain_xing_25 + 19.6% syrian_behar_16" "3.8908"
[15,] "19% lebanese_behar_7 + 81% punjabi-arain_xing_25" "4.0494"
[16,] "18.9% jordanian_behar_20 + 81.1% sindhi_hgdp_24" "4.078"
[17,] "79.9% punjabi_harappa_7 + 20.1% yemenese_behar_8" "4.1222"
[18,] "15.1% bedouin_hgdp_46 + 84.9% punjabi-arain_xing_25" "4.1522"
[19,] "85.3% punjabi-arain_xing_25 + 14.7% saudi_behar_20" "4.2014"
[20,] "79.1% punjabi_harappa_7 + 20.9% syrian_behar_16" "4.2191"

These results are closer to my actual reported ancestry than the ones from reference 3 oracle.

Related Reading:

The Oracle Code (Thomas Lourds, Book 4)
Oracle SQL Tuning with Oracle SQLTXPLAIN

HarappaWorld Admixture

Here is a new admixture calculator. This uses populations all over the world and I got the best results (i.e., lowest crossvalidation error) at K=16.

You can see the admixture results for different ethnic groups as well as results for individual (founder-only) project participants.

UPDATE: The population results have been calculated using weighted means.

The group results are also shown in the usual interactive bar chart below. You can click on the component labels to sort by that ancestral component.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations.

I used about 188,173 SNPs for this run. The results for Henn2011 (181,223 SNPs for Hadza, Sandawe and San, 26,494 SNPs for other groups), Henn2012 (26,494 SNPs), Reich (48,967 SNPs) and Xing (18,986 SNPs) datasets reported above were however calculated using lower number of common SNPs. Hence caution should be exercised in interpreting those results.

You can also see the Fst distances between the ancestral components.

I should have HarappaWorldOracle and DIYHarappaWorld calculators out in the next few days.

Also, I am working on another calculator which will focus more closely on South Asia.

Related Reading:

Reference Guide for Essential Oils Hard Cover 2012
The New York Times Guide to Essential Knowledge: A Desk Reference for the Curious Mind
Windows 8 Quick Reference Guide (Cheat Sheet of Instructions, Tips & Shortcuts - Laminated Guide)
Publication Manual of the American Psychological Association, 6th Edition
Microsoft Excel 2010 Introduction Quick Reference Guide (Cheat Sheet of Instructions, Tips & Shortcuts - Laminated Card)

Harappa Oracle

Based on the Dodecad Oracle, here is Harappa Oracle using reference 3 admixture results.

I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components. Because of that it is several times slower than DodecadOracle. I plan to offer an option soon to switch between Euclidean distance and Fst-weighted distance.

You need to install R to use it. Then unzip the Oracle zip file. Double-click on the file or use the following in R:

load('HarappaOracleR3fst.RData')

In R, you can look at the 385 populations included by typing:

X[,1]

To use it to find your closest populations, you need your Harappa Reference 3 admixture results. Use them separated by commas like this (for me):

HarappaOracle(c(44,12,0,24,14,1,2,0,0,1,2))

You will get a result, with the first column showing the closest populations and the 2nd column their distance to you.

[,1] [,2]
[1,] "balochi" "8.0242"
[2,] "bene-israel" "9.2843"
[3,] "brahui" "9.5158"
[4,] "pathan" "9.7034"
[5,] "makrani" "10.1014"
[6,] "sindhi" "10.9236"
[7,] "Bhatia" "11.8441"
[8,] "Sindhi" "12.1704"
[9,] "Kashmiri" "13.4229"
[10,] "punjabi-arain" "13.9192"

You can also find out the closest populations to one of the reference populations:

HarappaOracle("punjabi-arain")

By default, the Oracle shows the 10 closest populations. You can change that:

HarappaOracle("punjabi-arain",k=20)

Also, by default, the Oracle excludes the Pan-Asian dataset since the overlap is only 5,400 SNPs. You can include Pan-Asian populations:

HarappaOracle("punjabi-arain",panasian=T)

There is also a mixed mode where the individual (or mean reference population) is compared against all pairs of populations as ancestors.

HarappaOracle("Haryana Jatt",mixedmode=T)

which has the following output:

[1,] "Haryana Jatt" "0"
[2,] "15.4% lithuanians + 84.6% Punjabi Brahmin" "1.9553"
[3,] "10.6% russian + 89.4% Rajasthani Brahmin" "2.0626"
[4,] "14.7% finnish + 85.3% Punjabi Brahmin" "2.0863"
[5,] "9.2% finnish + 90.8% Rajasthani Brahmin" "2.1142"
[6,] "89.4% Rajasthani Brahmin + 10.6% mordovians" "2.1727"
[7,] "9.6% lithuanians + 90.4% Rajasthani Brahmin" "2.1989"
[8,] "10.1% belorussian + 89.9% Rajasthani Brahmin" "2.2938"
[9,] "16.8% russian + 83.2% Punjabi Brahmin" "2.3015"
[10,] "16.2% belorussian + 83.8% Punjabi Brahmin" "2.3656"

You can of course combine any or all of the options.

Think of Harappa Oracle as a tool to help you interpret your admixture results by comparing who you are closest to. Do not think of it as giving you your real ancestry.

Related Reading:

Oracle SQL Tuning with Oracle SQLTXPLAIN
Beginning Oracle Database 11g  Administration: From Novice to Professional (Expert's Voice in Oracle)
Advancements of Ancient India's Vedic Culture: The Planet's Earliest Civilization and How it Influenced the World
The Harappa Files
Oracle Database 11g Release 2 Performance Tuning Tips & Techniques (Oracle Press)