Let's take a look at the Bengali participants of the Harappa Ancestry Project.

I have added a suffix to the IDs where B = Brahmin, V = Vaidya and M = Muslim.

Here are the HarappaWorld Admixture results for the Bengalis which you can also see in a spreadsheet.

It's easy to see the difference between the Brahmins and others.

Razib wanted to know the origin of the East Asian ancestry among the Bengalis. So I ran a supervised ADMIXTURE with the following populations set as ancestral:

  • Altaian
  • Burmanese
  • Buryat
  • Cambodian
  • Chukchi
  • Dai
  • Daur
  • Dolgan
  • Evenki
  • Georgian
  • Gujarati-A
  • Han
  • Han-NChina
  • Hezhen
  • Japanese
  • Ket
  • Kinh
  • Koryak
  • Lahu
  • Miao
  • Mongola
  • Mongolian
  • Naxi
  • Nganassan
  • Oroqen
  • Selkup
  • She
  • Singapore-Malay
  • Tibet
  • Tu
  • Tujia
  • Tuvinian
  • Xibo
  • Yakut
  • Yi
  • Yukaghir

While most of these populations are various East Asian groups, I used the Gujarati-A as the South Asian group since it has the most South Indian + Baloch components without any East Asian influence. I used the Georgians as a proxy for West Asian ancestry.

Since it's K=36, I ran ADMIXTURE 10 times with different seeds and computed the average percentages for the Bengali participants. The number of SNPs was about 85,565. I did a similar analysis at K=35 after excluding the Tibetans, which got me 263,000 SNPs. The results were broadly similar.

I am showing only the first 12 ancestral components since all the rest were less than 0.5% for all the Bengalis (Spreadsheet).

Please do remember that in supervised ADMIXTURE, I assign the ancestral populations and the algorithm has to find the best fit using those populations. So it's not showing actual ancestry but broad affinity. Also, the exact percentages are not important and can vary when I change the parameters of the analysis. Just look at the broad trends.

The general pattern is that Bengali Brahmins have the least Eastern Eurasian and the most West Asian. The Eastern Eurasian ethnicity most closely related to Bengalis is Burmese.

Interestingly, there is a pattern of a small amount of Siberian ancestry among these Bengalis. Let's add all the Siberian and Russian Far East groups.

ID Ethnicity Siberian
HRP0244 West Bengal Rajput 5.07%
HRP0077B Bengali Brahmin 5.01%
HRP0049 Bengali 4.45%
HRP0252B Bengali Brahmin 4.01%
HRP0268B Bengali Brahmin 3.90%
HRP0023M Bengali Muslim 3.54%
HRP0316B Bengali Brahmin 3.45%
HRP0054B Bengali Brahmin 3.41%
HRP0300M Bengali Muslim 2.95%
HRP0240V Bengali Vaidya 1.78%
HRP0293B Bengali Brahmin 1.02%
HRP0291V Bengali Vaidya 0.99%
HRP0317M Bengali Muslim 0.89%
HRP0321M Bengali Muslim 0.58%
HRP0322M Bengali Muslim 0.41%
HRP0022M Bengali Muslim 0.37%
HRP0091B Bengali Brahmin 0.01%

I am not sure of the pattern here, but at least the first few are above noise thresholds.

Gujaratis HarappaWorld Admixture

Someone asked for the individual HarappaWorld Admixture results for the Gujarati B from HapMap.

To refresh your memory, the Gujarati B are those individuals who do not form part of the big closely clustered Gujarati cluster.

I decided to include HGDP Sindhis as well as Gujaratis, Rajasthanis, Maharashtrians, etc from Harappa Ancestry Project in the list so the Gujaratis can be compared to the people of neighboring regions.

You can check the spreadsheet too.

UPDATE: I have added the Thathai Bhatia and Halai Bhatia participants that I had forgotten.

Xing et al Data

The data for Xing et al's paper "Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping" is available online.

This dataset consists of 850 individuals, but 259 of them overlap with the HapMap. Another 15 samples had to be removed because they were too similar to others. I also removed Native American samples. This leaves us with 529 samples.

Ethnic group Count
Slovenian 25
Punjabi Arain 25
N. European 25
Nepalese 25
Kyrgyzstani 25
Iban 25
Buryat 25
Bambaran 25
Andhra Pradesh Brahmin 25
Kurd 24
Dogon 24
Irula 23
Thai 22
Pygmy 22
Urkarah 18
Tamil Nadu Brahmin 14
Hema 14
Tongan 13
Tamil Nadu Dalit 13
Samoan 13
!Kung 13
Japanese 13
Andhra Pradesh Mala 11
Pedi 10
Andhra Pradesh Madiga 10
Alur 10
Nguni 9
Sotho/Tswana 8
Vietnamese 7
Stalskoe 5
Chinese 5
Khmer Cambodian 3

This dataset is valuable because it contains several South Asian, Central Asian, Southeast Asian and Caucasian groups. However, it does not have a good SNP overlap with 23andme and the other datasets. It has only about 29,000 SNPs in common with 23andme v2 data. Combining HapMap, HGDP, SGVP, Behar et al and Xing et al with 23andme data leaves us with 25,000 SNPs. Due to that, I'll be using Xing et al data for only a few analyses.