Category Archives: Admixture

HarappaWorld HRP0385-HRP0419

Posted by Zack on May 3, 2014 Comments Off

I have been working on a new admixture calculator whenever I have found some time from real life pursuits. However, that's still not ready and I have a lot of submissions. So I am posting the HarappaWorld results for them.

I have added the HarappaWorld Admixture results for HRP0385-HRP0419 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have ~~not yet~~ now updated the group averages.

I got the first submissions of 23andme v4 data. There's about a 12-13% missing SNP rate of v4 with HarappaWorld. So I don't expect major noise problems, though noise will be higher than for 23andme v3 data.

Afghan Dataset

Posted by Zack on January 16, 2014 190 comments

A paper, Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge by Julie Di Cristofaro, Erwan Pennarun, Stéphane Mazières, Natalie M. Myres, Alice A. Lin, Shah Aga Temori, Mait Metspalu, Ene Metspalu, Michael Witzel, Roy J. King, Peter A. Underhill, Richard Villems, Jacques Chiaroni was published at PLoS One about the genetics of the people of Afghanistan.

Thanks to Mait Metspalu, the data is available online. It consists of:

5 Hazara
5 Pashtun
5 Tajik
4 Turkmen
5 Uzbek

Here are the HarappaWorld Admixture results for the samples in this dataset.

You can check the spreadsheet too.

Tadjik1_44Af and Pashtun2_6Af seem to be outliers and there's a possibility they are mislabeled. I would like to look into these two samples further before I calculate group averages.

You can compare these Pashtun results to HGDP Pathan and HAP Pashtun results.

HarappaWorld HRP0375-HRP0384

Posted by Zack on December 3, 2013 118 comments

I have added the HarappaWorld Admixture results for HRP0375-HRP0384 to the individual spreadsheet.

I have also updated the group averages.

HarappaWorld HRP0352-HRP0374

Posted by Zack on November 1, 2013 44 comments

I have added the HarappaWorld Admixture results for HRP0352-HRP0374 to the individual spreadsheet.

I have also updated the group averages.

Burusho Kalash HarappaWorld Admixture

Posted by Zack on October 29, 2013 45 comments

Someone asked for the individual HarappaWorld Admixture results for the Burusho and Kalash from HGDP.

In the chart below as well as in the spreadsheet, the IDs starting with "b" belong to the Burusho and those starting with "k" belong to the Kalash individuals.

You can check the spreadsheet too.

HarappaWorld HRP0328-HRP0351

Posted by Zack on August 28, 2013 95 comments

I have added the HarappaWorld Admixture results for HRP0328-HRP0351 to the individual spreadsheet.

I have also updated the group averages.

Since 10 of the 24 new participants are Punjabis (with 8 being Punjabi Jatts), I now have 33 Punjabis in HAP. Therefore, I will try to write about the Punjabi samples and their results next week.

Bengalis

Posted by Zack on August 7, 2013 16 comments

Let's take a look at the Bengali participants of the Harappa Ancestry Project.

I have added a suffix to the IDs where B = Brahmin, V = Vaidya and M = Muslim.

Here are the HarappaWorld Admixture results for the Bengalis which you can also see in a spreadsheet.

It's easy to see the difference between the Brahmins and others.

Razib wanted to know the origin of the East Asian ancestry among the Bengalis. So I ran a supervised ADMIXTURE with the following populations set as ancestral:

Altaian
Burmanese
Buryat
Cambodian
Chukchi
Dai
Daur
Dolgan
Evenki
Georgian
Gujarati-A
Han
Han-NChina
Hezhen
Japanese
Ket
Kinh
Koryak
Lahu
Miao
Mongola
Mongolian
Naxi
Nganassan
Oroqen
Selkup
She
Singapore-Malay
Tibet
Tu
Tujia
Tuvinian
Xibo
Yakut
Yi
Yukaghir

While most of these populations are various East Asian groups, I used the Gujarati-A as the South Asian group since it has the most South Indian + Baloch components without any East Asian influence. I used the Georgians as a proxy for West Asian ancestry.

Since it's K=36, I ran ADMIXTURE 10 times with different seeds and computed the average percentages for the Bengali participants. The number of SNPs was about 85,565. I did a similar analysis at K=35 after excluding the Tibetans, which got me 263,000 SNPs. The results were broadly similar.

I am showing only the first 12 ancestral components since all the rest were less than 0.5% for all the Bengalis (Spreadsheet).

Please do remember that in supervised ADMIXTURE, I assign the ancestral populations and the algorithm has to find the best fit using those populations. So it's not showing actual ancestry but broad affinity. Also, the exact percentages are not important and can vary when I change the parameters of the analysis. Just look at the broad trends.

The general pattern is that Bengali Brahmins have the least Eastern Eurasian and the most West Asian. The Eastern Eurasian ethnicity most closely related to Bengalis is Burmese.

Interestingly, there is a pattern of a small amount of Siberian ancestry among these Bengalis. Let's add all the Siberian and Russian Far East groups.

ID	Ethnicity	Siberian
HRP0244	West Bengal Rajput	5.07%
HRP0077B	Bengali Brahmin	5.01%
HRP0049	Bengali	4.45%
HRP0252B	Bengali Brahmin	4.01%
HRP0268B	Bengali Brahmin	3.90%
HRP0023M	Bengali Muslim	3.54%
HRP0316B	Bengali Brahmin	3.45%
HRP0054B	Bengali Brahmin	3.41%
HRP0300M	Bengali Muslim	2.95%
HRP0240V	Bengali Vaidya	1.78%
HRP0293B	Bengali Brahmin	1.02%
HRP0291V	Bengali Vaidya	0.99%
HRP0317M	Bengali Muslim	0.89%
HRP0321M	Bengali Muslim	0.58%
HRP0322M	Bengali Muslim	0.41%
HRP0022M	Bengali Muslim	0.37%
HRP0091B	Bengali Brahmin	0.01%

I am not sure of the pattern here, but at least the first few are above noise thresholds.

An Analysis of HAP by Razib

Posted by Zack on August 6, 2013 5 comments

Razib Khan looked over the HarappaWorld Admixture results and posted his analysis on his GNXP blog. Go read it.

Haber et al Lebanon Data

Posted by Zack on August 3, 2013 45 comments

Haber et al published a paper Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture in PLoS Genetics. Here's their abstract:

The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding of demographic changes in the Middle East, particularly the Levant, after the first Out-of-Africa expansion and how the Levantine peoples relate genetically to each other and to their neighbors. In this study we analyze more than 500,000 genome-wide SNPs in 1,341 new samples from the Levant and compare them to samples from 48 populations worldwide. Our results show recent genetic stratifications in the Levant are driven by the religious affiliations of the populations within the region. Cultural changes within the last two millennia appear to have facilitated/maintained admixture between culturally similar populations from the Levant, Arabian Peninsula, and Africa. The same cultural changes seem to have resulted in genetic isolation of other groups by limiting admixture with culturally different neighboring populations. Consequently, Levant populations today fall into two main groups: one sharing more genetic characteristics with modern-day Europeans and Central Asians, and the other with closer genetic affinities to other Middle Easterners and Africans. Finally, we identify a putative Levantine ancestral component that diverged from other Middle Easterners ~23,700â€“15,500 years ago during the last glacial period, and diverged from Europeans ~15,900â€“9,100 years ago between the last glacial warming and the start of the Neolithic.

They also released their data consisting of 75 Lebanese from different regions of the country, with 25 samples each for Muslims, Druze and Christians.

Here are the HarappaWorld admixture results for the Lebanese.

You can check the spreadsheet too.

As the authors mention in the summary:

Population stratification caused by nonrandom mating between groups of the same species is often due to geographical distances leading to physical separation followed by genetic drift of allele frequencies in each group. In humans, population structures are also often driven by geographical barriers or distances; however, humans might also be structured by abstract factors such as culture, a consequence of their reasoning and self-awareness. Religion in particular, is one of the unusual conceptual factors that can drive human population structures. This study explores the Levant, a region flanked by the Middle East and Europe, where individual and population relationships are still strongly influenced by religion. We show that religious affiliation had a strong impact on the genomes of the Levantines. In particular, conversion of the region's populations to Islam appears to have introduced major rearrangements in populations' relations through admixture with culturally similar but geographically remote populations, leading to genetic similarities between remarkably distant populations like Jordanians, Moroccans, and Yemenis. Conversely, other populations, like Christians and Druze, became genetically isolated in the new cultural environment. We reconstructed the genetic structure of the Levantines and found that a pre-Islamic expansion Levant was more genetically similar to Europeans than to Middle Easterners.

the Lebanese can be grouped better based on religion than region. That's why I am using group averages by religion.

HarappaWorld HRP0312-HRP0327

Posted by Zack on July 28, 2013 50 comments

I have added the HarappaWorld Admixture results for HRP0312-HRP0327 to the individual spreadsheet.

I have also updated the group averages.

I got a participant from the Geno 2.0 Project, HRP0326 an Afghan Pashtun. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Harappa Ancestry Project

Genetics and South Asia

Category Archives: Admixture

HarappaWorld HRP0385-HRP0419

Afghan Dataset

HarappaWorld HRP0375-HRP0384

HarappaWorld HRP0352-HRP0374

Burusho Kalash HarappaWorld Admixture

HarappaWorld HRP0328-HRP0351

Bengalis

An Analysis of HAP by Razib

Haber et al Lebanon Data

HarappaWorld HRP0312-HRP0327

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Category Archives: Admixture

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll