Tag Archives: east-asian


Let's take a look at the Bengali participants of the Harappa Ancestry Project.

I have added a suffix to the IDs where B = Brahmin, V = Vaidya and M = Muslim.

Here are the HarappaWorld Admixture results for the Bengalis which you can also see in a spreadsheet.

It's easy to see the difference between the Brahmins and others.

Razib wanted to know the origin of the East Asian ancestry among the Bengalis. So I ran a supervised ADMIXTURE with the following populations set as ancestral:

  • Altaian
  • Burmanese
  • Buryat
  • Cambodian
  • Chukchi
  • Dai
  • Daur
  • Dolgan
  • Evenki
  • Georgian
  • Gujarati-A
  • Han
  • Han-NChina
  • Hezhen
  • Japanese
  • Ket
  • Kinh
  • Koryak
  • Lahu
  • Miao
  • Mongola
  • Mongolian
  • Naxi
  • Nganassan
  • Oroqen
  • Selkup
  • She
  • Singapore-Malay
  • Tibet
  • Tu
  • Tujia
  • Tuvinian
  • Xibo
  • Yakut
  • Yi
  • Yukaghir

While most of these populations are various East Asian groups, I used the Gujarati-A as the South Asian group since it has the most South Indian + Baloch components without any East Asian influence. I used the Georgians as a proxy for West Asian ancestry.

Since it's K=36, I ran ADMIXTURE 10 times with different seeds and computed the average percentages for the Bengali participants. The number of SNPs was about 85,565. I did a similar analysis at K=35 after excluding the Tibetans, which got me 263,000 SNPs. The results were broadly similar.

I am showing only the first 12 ancestral components since all the rest were less than 0.5% for all the Bengalis (Spreadsheet).

Please do remember that in supervised ADMIXTURE, I assign the ancestral populations and the algorithm has to find the best fit using those populations. So it's not showing actual ancestry but broad affinity. Also, the exact percentages are not important and can vary when I change the parameters of the analysis. Just look at the broad trends.

The general pattern is that Bengali Brahmins have the least Eastern Eurasian and the most West Asian. The Eastern Eurasian ethnicity most closely related to Bengalis is Burmese.

Interestingly, there is a pattern of a small amount of Siberian ancestry among these Bengalis. Let's add all the Siberian and Russian Far East groups.

ID Ethnicity Siberian
HRP0244 West Bengal Rajput 5.07%
HRP0077B Bengali Brahmin 5.01%
HRP0049 Bengali 4.45%
HRP0252B Bengali Brahmin 4.01%
HRP0268B Bengali Brahmin 3.90%
HRP0023M Bengali Muslim 3.54%
HRP0316B Bengali Brahmin 3.45%
HRP0054B Bengali Brahmin 3.41%
HRP0300M Bengali Muslim 2.95%
HRP0240V Bengali Vaidya 1.78%
HRP0293B Bengali Brahmin 1.02%
HRP0291V Bengali Vaidya 0.99%
HRP0317M Bengali Muslim 0.89%
HRP0321M Bengali Muslim 0.58%
HRP0322M Bengali Muslim 0.41%
HRP0022M Bengali Muslim 0.37%
HRP0091B Bengali Brahmin 0.01%

I am not sure of the pattern here, but at least the first few are above noise thresholds.

East Asian Admixture

Let's look at the East Asian admixture among South Asians and other surrounding populations from a previous admixture run (K=12).

I have listed the different kinds of East Asian admixture components among selected populations. The three relevant components are:

  1. Southeast Asian: Highest among the Dai, Cambodians, Lahu and Malay, this is the most common East Asian component among South Asians.
  2. Northeast Asian: Highest among the Naga, Nysha, Japanese and north Han.
  3. Siberians: Highest among the Nganassans and Evenkis, this is lowest among South Asians overall. While this is not quite Turkic, it is the one most related to them.

Let's look at the total East Asian percentage among South Asians.

As expected, the eastern part of South Asia is where we see most of the East Asian admixture.

Now instead of looking at the absolute percentages of Southeast Asian admixture, let's look at the Southeast Asian component as a percentage of total East Asian component.

South and East India seem like mostly Southeast Asian admixture.

Now the same map for Northeast Asian as a proportion of total East Asian:

The Northeast Asian component dominates along the northern border of South Asia.

Finally the Siberian:

Compared to the other two, Siberian component is fairly low among South Asians, so it's difficult to separate the noise from real admixture here. Most of the peaks you see are among populations that have low East Asian admixture.

Chinese Samples

Mithra asked:

Almost all the Chinese are now around 50% SE Asian, didn’t see this before is it right.

So I decided to look at the Chinese samples in Reference I dataset.

I ran Admixture on the whole Reference I dataset for K=10 ancestral populations. The green component is what I call Southeast Asian, blue is Northeast Asian (highest among the Japanese) and violet is Siberian (highest among the Yakut).

Here is the plot for the 106 HapMap Chinese samples from Denver (label: us chinese):

HapMap US Chinese

For the 137 HapMap samples from Beijing, China (label: han chinese):

HapMap Han Chinese

For the 34 HGDP Han samples (label: han):


For the 10 HGDP Han samples from North China (label: han-nchina):

HGDP Han North China

As you can see, the "Southeast Asian" component goes down from the top group to the bottom one, which is as expected.

I wasn't satisfied with these results, so I decided to run Admixture on the East Asian samples in Reference I separately.

East Asian Admixture K=3

At K=3, the results are about the same as at K=10 for the whole reference I population. The Han all have a significant amount of blue component which is highest among the Southeast Asians.

East Asian Admixture K=4

At K=4, we get a Chinese ("East Asian") component. So we have Japanese, Chinese, Yakut and Southeast Asian components. This is what most of you were probably expecting.

Why did the Japanese become the modal population for the Northeast Asian component? I ran a PCA on the East Asian data to see how the different populations looked on a PCA plot. Remember that eigenvector 1 explains 1.49 times the variance of eigenvector 2 and 1.9 times the variance of eigenvector 3. Thus, eigenvector 2 explains 1.28 times the variation explained by eigenvector 3.

East Asian PCA eig1 vs eig2

East Asian PCA eig1 vs eig3

East Asian PCA eig2 vs eig3

As you can see, the Yakut are the far away, but the Japanese are also fairly well-separated from the Chinese populations.

If I didn't have the 141 Japanese samples in my reference dataset, the Northeast Asian component would be centered on the Han most likely, which is the case for Dodecad.

I think this shows that it is not correct to think of the ancestral components inferred from admixture as some pure ancestral population.