ANI-ASI Admixture Dating

Similar to an earlier conference poster, Reich Lab's Priya Moorjani et al have another poster at SMBE. Here's the abstract:

Estimating a date of mixture of ancestral South Asian populations
Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI mixture occurred 1,200-4,000 years ago. Some mixture may also be older—beyond the time we can query using admixture linkage disequilibrium—since it is universal throughout the subcontinent: present in every group speaking Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of additional mixture.

I bolded the portion which seems new compared to the previous abstract.

ANI/ASI Admixture Dating

Via Razib, here's an interesting abstract from the International Congress of Human Genetics by David Reich's group:

Estimating a date of mixture of ancestral South Asian populations.

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.

I would be very interested in reading that paper. Also, I wonder how many new samples did they genotype beyond the ones in Reich et al' Reconstructing Indian Population History and if I could get my hands on the new data.

I have a feeling that ANI (Ancestral North Indian) captures a bunch of different migrations and conquests etc, so I am not sure if it can be equated to Indo-European language movement.

I wonder if I can use HAPMIX or StepPCO to get similar admixture dating.

Of sensible sematics

One of my met peeves is the confusion which some ethno-linguistic terms can cause. For example, the fact that there were Iranian language speakers on the plains of Ukraine ~2,000 years ago naturally indicates to people that Scythian nomads issued out of Iran northwards. Similarly, the existence of Indo-Aryan Mitanni in what is today Syria also suggests to people that there was a migration of Indians which traversed much of West Asia in a drive toward the Mediterranean from the Indus. Of course our perception of the center of gravity of these ethno-linguistic groups today is a function of historical contingency. If we didn't know much more about Antique and Medieval European history we might posit that the Celtic Galatians of ancient Anatolia were originally from Ireland, based on the contemporary distribution of Celtic languages!

This issue is now cropping up South Asian archaeogenetics. In my opinion the paper Reconstructing Indian population history is probably the most important contribution to the field in a generation. The authors explain technically why a "South Asian" ancestral component falls out of ancestry inference algorithms at the heart of ADMIXTURE, STRUCTURE, or frappe. In short, when you have a population which is a hybrid, but where the hybridization event is very distant in the past, recombination breaks up the signatures of that event (a decay of the linkage disequilibrium between two putative ancestral populations). Additionally, in the Indian case there doesn't seem to be a "pure" population of one of the two ancestral groups, what they termed "Ancestral South Indians" (ASI). The closest reference they found were Onge Andaman Islanders, whose last common ancestors with ASI was on the order of tens of thousands of years in the past. They do have excellent proxies for the other population, "Ancestral North Indians" (ANI). Compared to ASI all West Eurasians can be used as reasonable proxies for ANI.

Indian Cline III

I have been working on creating 100% ASI (Ancestral South Indian) samples recently. So it was really interesting that Dienekes did similar experiments:

I am going about creating the "pure" allele frequencies somewhat differently, so that would be a useful exercise.

Anyway, I thought you guys would be itching for some new results. So here's a PCA plot:

This used the same Principal Component Analysis as the one here using the 96 Indian Cline samples, Utahn Whites and Onge. However, I projected three extra "populations" on this plot.

These three populations are simulated genetic data of 25 individuals using the allele frequencies from Reference 3 Admixture results.

  1. Onge11 is generated from the Onge (C2) component from K=11 admixture for Reference 3.
  2. SA11 is generated from the South Asian (C1) component from the same K=11 admixture.
  3. SA12 is generated from the South Asian (C1) component from the K=12 admixture.

As you can see, the SA12 population lies between 100% ASI and the Indian Cline samples.

The Onge11 generated samples are a bit beyond 100% ASI on the first principal component, but they are also shifted towards the real Onge on pc2.

Indian Cline II

One thing I forgot in the post yesterday about the Indian cline was to try to extrapolate from the PCA results to 100% ANI (Ancestral North Indian) and 100% ASI (Ancestral South Indian).

This is a simple linear extrapolation which should be okay since PCA is linear.

The "N" denotes the extrapolated position of ANI and "S" denotes the ASI. The points to the left of "N" are all Utahn Whites while the Onge are on the bottom right of the graph.

As you can see, the ASI is about the same as Onge in terms of eigenvector 1 (which represents the Indian cline approximately), but ASI is far from Onge on the 2nd eigenvector. That is expected since the Onge have been separated from the mainland populations for a long time.

The more interesting thing is that the extrapolated position of ANI is a little to the right of all the Utahn Whites.

We'll need a similar analysis of the Indian cline with more populations to see which one the ANI is closest to.

PS. I should point out that I am using correlation between a limited number of population statistics to find a relationship between the 1st principal component and Reich et al's ASI estimate. This has a number of drawbacks. It would be much better to compute ASI directly.

Dienekes on ANI/ASI

Dienekes has a word of caution about choosing reference populations and admixture results.

Consider a sample of 25 Mexicans from the HapMap and 25 Yoruba from the Hapmap, 25 Iberian Spanish from the 1000 Genomes Project, and 25 Pima from the HGDP as parental populations. We obtain for our Mexican sample:

  • 59.7% European
  • 36.9% "Native American"
  • 3.4% African

Let's run a final experiment with just the Mexicans, Spanish, and Yoruba, i.e., with no Native American samples. At K=3 we obtain:

  • 70% "Native American"
  • 29.7% European
  • 0.4% African

The "Native American" component has increased again! The explanation is simple: as we exclude less admixed Native American groups, Mexicans appear (comparatively) more Native American. The "Native American pole" has shifted, and so has the relative position of populations between them.

In other terms, what is labeled "Native American" in the three experiments is not the same: in the first one it is anchored on the more unadmixed Pima, in the last one in the more admixed Mexicans.

Thus, it seems that unadmixed reference samples are much more useful in getting good results from Admixture.

Then he runs Admixture on the Reich et al dataset for South Asians and tries to estimate the relationship between the Ancestral North Indian percentage computed by Reich et al and his K=2 admixture results on the same data.

Dienekes then included South Asian Dodecad participants in the analysis and ran a K=4 admixture analysis on Reich et al + Dodecad South Asian data, including Yoruba and Beijing Chinese from the HapMap to catch any African or East Asian ancestry.

Here are the admixture results for the reference populations:

The R2 correlation between the West Eurasian admixture component and the Reich et al ANI component is 0.98 which is good. His relationship equation comes out to:

ANI = 0.779*WestEurasian + 39.674

Using this relationship, he calculates the ANI and ASI (Ancestral South Indian) components for Dodecad project members. My results (DOD128) are as follows:

East Eurasian 0.0%
African 3.5%
Ancestral North Indian 75.9%
Ancestral South Indian 20.6%

I should point out that due to my recent Egyptian ancestry, my ANI result is wrong since it's collecting all of the non-African Egyptian in there too.

Also, in the case of Razib, I don't think his East Asian 14.4% should be separated out from his ANI-ASI like that. At least some of it should form part of his ASI percentage in my opinion.

Otherwise, this seems like a very good exercise by Dienekes.

