Category Archives: Admixture - Page 7

Of literality and metaphor in the war between Arya and Dasa

Over at Brown Pundits Zach Latif brings up the point that the Indian bias for light skin may date back to the Aryans. And it does seem that such a bias manifests in the earliest texts. But as someone not able to read the original languages I can ascertain the arguments as the import of these passages only at a remove, second and third hand. Some scholars have suggested that the racialized interpretations of the treatments of the interactions between the Aryans and the natives of India are just a look at the past through the lens of the present. They are argue that color terms in the Vedas are metaphors. In contrast, there are others who seem to be arguing for a more straightforward and "literal" reading of the source text. As a non-philologist there's little I can add. But I didn't dismiss those who argued for a metaphorical reading because I know from the literature in the area of Biblical scholarship that straightforward and "literal" readings are quite often deceptive and require their own interpretation (it is difficult to transfer idioms and metaphors properly across languages, and the translators are tempted to render them in the most congenial manner to their own broader theses).

To be frank the information being uncovered by Zack and others makes me think that there was a racial aspect to these conflicts; that the literal reading has some truth. The tribal folk of India are genetically distinct from the caste populations, especially the higher castes. Though all South Asians are a mix to varying degrees of an exogenous West Eurasian element and a South Eurasian indigenous component, the non-Austro-Asiatic tribal populations seem to be a relatively simple combination. The genetic complexity of the structure of other groups suggests to me that there were several later West Eurasian intrusions after the arrival of the "Ancestral North Indians" (ANI), and their hybridization event with the "Ancestral South Indians" (ASI). Racialized language in the older Hindu scriptures may then be conceived of as a conflict between the latest arrivals and the older long established groups which were a stabilized ANI-ASI compound.

One can imagine that this process recapitulated itself in the late medieval and early modern period with the arrival of the Muslims. The ruling Islamic elites who were of Persian and Turk stock viewed the native Hindus through a racialized lens, and were at pains (and to some extent still are!) to distinguish between Muslims of foreign provenance who were "white" and converts from the native populations who were "black." The physical differences are evident when you compare the Emperor Akbar with his grandson Shah Jahan, whose other three grandparents were Rajputs.

Caste is not ancestrally arbitrary

First, thanks to Zack for the opportunity to blog here. More importantly, thanks to Zack for the Harappa Ancestry Project! I've learned a lot from him in terms of the optimal way to go about "genome blogging," and have been able to benefit from his experiences in my own African Ancestry Project. It's really great that in 2011 we don't have to wait for academic researchers to explore the topics which interest us at the intersection of genetics and history.

Prior to being interested in South Asian genetics on such a fine-grained level I had read works such as Nicholas B. Dirks' Castes of Mind. To give you a sense of Dirks' argument, here's the summary from Library Journal:

Is India's caste system the remnant of ancient India's social practices or the result of the historical relationship between India and British colonial rule? Dirks (history and anthropology, Columbia Univ.) elects to support the latter view. Adhering to the school of Orientalist thought promulgated by Edward Said and Bernard Cohn, Dirks argues that British colonial control of India for 200 years pivoted on its manipulation of the caste system. He hypothesizes that caste was used to organize India's diverse social groups for the benefit of British control. His thesis embraces substantial and powerfully argued evidence. It suffers, however, from its restricted focus to mainly southern India and its near polemic and obsessive assertions. Authors with differing views on India's ethnology suffer near-peremptory dismissal....

One of the inferences which people draw from this model, perhaps unfairly, is that the endogamy and biological separation of caste groups is relatively new, and that genetic variation is likely to be arbitrarily distributed across caste groups. The most extreme interpretations almost seem to turn the British into the culture-creators of all that is Indian. In any case, genetics can obviously test the power of this thesis in relation to ancestry.

First up, below I have taken all the HAP samples where N >= 2. I've done some semantic shifting, so that "Tamil Iyer" becomes "Tamil Brahmin." I know that some of you have more information about the samples than is listed in Zack's spreadsheet, but I've been conservative. I will also use the word "community" sometimes instead of "caste" in future posts, because I don't know what the proper word for Syrian Christians or Bihari Muslims would be. But really same difference to me. I want to focus on groups with caste/religious labels intersected with a specific region here. The bar plot below is not going to be a surprise, and you see the clusters in Zack's dendograms, but I thought it would still be useful.

Read more »

Admixture (Ref3 K=11) HRP0111-HRP0120

Here are the admixture results using Reference 3 for Harappa participants HRP0111 to HRP0120.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

Indian Cline III

I have been working on creating 100% ASI (Ancestral South Indian) samples recently. So it was really interesting that Dienekes did similar experiments:

I am going about creating the "pure" allele frequencies somewhat differently, so that would be a useful exercise.

Anyway, I thought you guys would be itching for some new results. So here's a PCA plot:

This used the same Principal Component Analysis as the one here using the 96 Indian Cline samples, Utahn Whites and Onge. However, I projected three extra "populations" on this plot.

These three populations are simulated genetic data of 25 individuals using the allele frequencies from Reference 3 Admixture results.

  1. Onge11 is generated from the Onge (C2) component from K=11 admixture for Reference 3.
  2. SA11 is generated from the South Asian (C1) component from the same K=11 admixture.
  3. SA12 is generated from the South Asian (C1) component from the K=12 admixture.

As you can see, the SA12 population lies between 100% ASI and the Indian Cline samples.

The Onge11 generated samples are a bit beyond 100% ASI on the first principal component, but they are also shifted towards the real Onge on pc2.

Misuse of Correlation

I have been misusing correlation in computing Ancestral South Indian percentages from PCA/ADMIXTURE and Reich et al population-level averages.

I have tried to make it clear that just looking at the correlation is not enough, that an admixture component is not similar to ASI just because it correlates well with Reich et al's ASI averages for the 18 Indian cline populations. Even when the correlation is higher than 0.99. To illustrate what I mean, let's look at the Ref4C admixture runs.

I calculated the mean for each admixture component from the K=2 to K=12 runs for the 18 Indian cline populations and then computed the correlation between that and the Reich et results. Let's take a look:

K Component Correlation
2 C1 Euro-Afro -0.9941887
3 C2 East Asian 0.9955347
4 C3 European -0.993933
5 C3 European -0.993277
6 C1 South Asian 0.9675099
7 C1 South Asian 0.993081
8 C1 South Asian 0.9932762
9 C1 South Asian 0.9914145
10 C1 South Asian 0.9918095
11 C1 South Asian 0.9919097
12 C1 South Asian 0.9918594

Where do you see the highest correlation? At K=3 ancestral populations, the East Asian component is very highly correlated with ASI for the Indian cline populations. Does that mean that we could use that to compute ASI? No, not at all. While it is expected that at K=3, ASI would be a little closer to East Asian than to European, East Asian is not a good proxy for ASI at all since we cannot extrapolate to other individuals and populations.

Indian Cline

I had used linear regression to estimate Ancestral South Indian (ASI) component from Reference 3 K=11 admixture run. Now here are a couple more exercises along the same lines but much simpler.

Just using the 96 Indian cline samples from Reich et al to compute PCA or admixture doesn't work as the Chenchu separate out in both analyses from the rest. So I added the Utahn White (CEU) samples from HapMap and the Onge from Reich et al.

First, I ran supervised admixture with two ancestral components, Utahn Whites and Onge. Here's the Onge component plotted against Reich et al's ASI estimate along with a linear regression estimate. The correlation between the two is 0.9908.

Second, I ran Principal Component Analysis (PCA) on the Indian cline samples plus Utahn Whites and Onge. Here are the first two PCA dimensions plotted. The first eigenvector explains 4.04% of the total variation and the 2nd explains 1.94%.

The first principal component is mostly along the Indian cline while the second one basically separates the Onge from everyone else.

Using the 1st principal component to estimate ASI, here's the plot with Reich et al's ASI estimate along with a regression line. The correlation between pc1 and ASI is 0.9929.

Note that both these methods work only if the samples are on the Indian cline, i.e., they don't have any other admixture.

And now for comparison, here's the linear regression for the Reference 3 K=11 admixture Onge component and ASI. The correlation here is 0.9949. Note that this is a little different than my previous analysis since I calculated the population averages using only the 96 samples recommended by Reich et al.

Here's a spreadsheet containing the data for these three runs.

There are a couple more tricks I have to figure out some things regarding Ancestral South Indian admixture. Let's hope they provide us some insight.

East Asian Admixture

Let's look at the East Asian admixture among South Asians and other surrounding populations from a previous admixture run (K=12).

I have listed the different kinds of East Asian admixture components among selected populations. The three relevant components are:

  1. Southeast Asian: Highest among the Dai, Cambodians, Lahu and Malay, this is the most common East Asian component among South Asians.
  2. Northeast Asian: Highest among the Naga, Nysha, Japanese and north Han.
  3. Siberians: Highest among the Nganassans and Evenkis, this is lowest among South Asians overall. While this is not quite Turkic, it is the one most related to them.

Let's look at the total East Asian percentage among South Asians.

As expected, the eastern part of South Asia is where we see most of the East Asian admixture.

Now instead of looking at the absolute percentages of Southeast Asian admixture, let's look at the Southeast Asian component as a percentage of total East Asian component.

South and East India seem like mostly Southeast Asian admixture.

Now the same map for Northeast Asian as a proportion of total East Asian:

The Northeast Asian component dominates along the northern border of South Asia.

Finally the Siberian:

Compared to the other two, Siberian component is fairly low among South Asians, so it's difficult to separate the noise from real admixture here. Most of the peaks you see are among populations that have low East Asian admixture.

Ref4C Admixture

I removed the Gujarati-A samples from the previous set of runs and ran admixture on the resulting dataset.

Nothing new pops out except that the Siberian component splits into Turkic/Tungusic and Nganasan components at K=12.

The admixture results are in a spreadsheet as usual.

K=11 & 12 have the lowest errors.

At K=13, Chenchu split off as their own cluster.

More Reference Admixture Runs

In addition to the removals and changes in the previous set of runs, I removed the Onge, Great Andamanese and Kalash for this set.

The admixture results of this dataset are in a spreadsheet as usual and the bar chart is below.

K=10, 11, 12 are the ones with the lowest cross-validation error.

I wonder if anyone is going to mind my calling C2 at K=9 Pakistani instead of Balochistan/Caucasus? 😉

I like K=12 here and K=12 or 13 in the previous run. So the question is which one of all these K runs with two different datasets should I use to replace the old reference I K=12 admixture runs?

Admixture (Ref3 K=11) HRP0101-HRP0110

Here are the admixture results using Reference 3 for Harappa participants HRP0101 to HRP0110.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

HRP0101 has 1/8th Gujarati Patel ancestry but has 0% Onge component while the expected value would be about 3%. Also, his/her being 1/2 Romany is not reflected in a 70% European percentage.

HRP0105, an Iranian Kurd, is similar to the Iraqi Kurd HRP0059.

HRP0108, a Halai Bhatia, looks mostly like Punjabis and Sindhis in the admixture results.

HRP0110, Mexican/Jewish, is half Native American and likely a quarter Jewish.