Tag Archives: asi

ANI/ASI Admixture Dating

Via Razib, here's an interesting abstract from the International Congress of Human Genetics by David Reich's group:

Estimating a date of mixture of ancestral South Asian populations.

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.

I would be very interested in reading that paper. Also, I wonder how many new samples did they genotype beyond the ones in Reich et al' Reconstructing Indian Population History and if I could get my hands on the new data.

I have a feeling that ANI (Ancestral North Indian) captures a bunch of different migrations and conquests etc, so I am not sure if it can be equated to Indo-European language movement.

I wonder if I can use HAPMIX or StepPCO to get similar admixture dating.

Related Reading:

The Typhoon Lover (Rei Shimura Mysteries)
Asi lo veo: Gente, Perspectivas, Comunicación
Beside the Golden Door
Tee-ani's Pirates (Siren Publishing Menage Amour)
Assembly Language and Computer Architecture Using C++ and Java(TM)

Of sensible sematics

One of my met peeves is the confusion which some ethno-linguistic terms can cause. For example, the fact that there were Iranian language speakers on the plains of Ukraine ~2,000 years ago naturally indicates to people that Scythian nomads issued out of Iran northwards. Similarly, the existence of Indo-Aryan Mitanni in what is today Syria also suggests to people that there was a migration of Indians which traversed much of West Asia in a drive toward the Mediterranean from the Indus. Of course our perception of the center of gravity of these ethno-linguistic groups today is a function of historical contingency. If we didn't know much more about Antique and Medieval European history we might posit that the Celtic Galatians of ancient Anatolia were originally from Ireland, based on the contemporary distribution of Celtic languages!

This issue is now cropping up South Asian archaeogenetics. In my opinion the paper Reconstructing Indian population history is probably the most important contribution to the field in a generation. The authors explain technically why a "South Asian" ancestral component falls out of ancestry inference algorithms at the heart of ADMIXTURE, STRUCTURE, or frappe. In short, when you have a population which is a hybrid, but where the hybridization event is very distant in the past, recombination breaks up the signatures of that event (a decay of the linkage disequilibrium between two putative ancestral populations). Additionally, in the Indian case there doesn't seem to be a "pure" population of one of the two ancestral groups, what they termed "Ancestral South Indians" (ASI). The closest reference they found were Onge Andaman Islanders, whose last common ancestors with ASI was on the order of tens of thousands of years in the past. They do have excellent proxies for the other population, "Ancestral North Indians" (ANI). Compared to ASI all West Eurasians can be used as reasonable proxies for ANI.

Read more »

Related Reading:

Battlefields and Burial Grounds: The Indian Struggle to Protect Ancestral Graves in the United States
Journey to the Ancestral Self: The Native Lifeway Guide to Living in Harmony with the Earth Mother (Bk.1)
Pastoral Care and PSE Entitlement and PR (Cassell Studies in Pastoral Care and Personal and Social Education)
Over the Anvil We Stretch

Indian Cline III

I have been working on creating 100% ASI (Ancestral South Indian) samples recently. So it was really interesting that Dienekes did similar experiments:

I am going about creating the "pure" allele frequencies somewhat differently, so that would be a useful exercise.

Anyway, I thought you guys would be itching for some new results. So here's a PCA plot:

This used the same Principal Component Analysis as the one here using the 96 Indian Cline samples, Utahn Whites and Onge. However, I projected three extra "populations" on this plot.

These three populations are simulated genetic data of 25 individuals using the allele frequencies from Reference 3 Admixture results.

  1. Onge11 is generated from the Onge (C2) component from K=11 admixture for Reference 3.
  2. SA11 is generated from the South Asian (C1) component from the same K=11 admixture.
  3. SA12 is generated from the South Asian (C1) component from the K=12 admixture.

As you can see, the SA12 population lies between 100% ASI and the Indian Cline samples.

The Onge11 generated samples are a bit beyond 100% ASI on the first principal component, but they are also shifted towards the real Onge on pc2.

Related Reading:

Tat Tvam Asi (Namaste Stories)
Beyond Outrage: What has gone wrong with our economy and our democracy, and how to fix them (Kindle Single)
Interaction: Revision de grammaire française
Dead Man's Touch (Steve Cline Mysteries)
Hot Sour Salty Sweet: A Culinary Journey Through Southeast Asia

Misuse of Correlation

I have been misusing correlation in computing Ancestral South Indian percentages from PCA/ADMIXTURE and Reich et al population-level averages.

I have tried to make it clear that just looking at the correlation is not enough, that an admixture component is not similar to ASI just because it correlates well with Reich et al's ASI averages for the 18 Indian cline populations. Even when the correlation is higher than 0.99. To illustrate what I mean, let's look at the Ref4C admixture runs.

I calculated the mean for each admixture component from the K=2 to K=12 runs for the 18 Indian cline populations and then computed the correlation between that and the Reich et results. Let's take a look:

K Component Correlation
2 C1 Euro-Afro -0.9941887
3 C2 East Asian 0.9955347
4 C3 European -0.993933
5 C3 European -0.993277
6 C1 South Asian 0.9675099
7 C1 South Asian 0.993081
8 C1 South Asian 0.9932762
9 C1 South Asian 0.9914145
10 C1 South Asian 0.9918095
11 C1 South Asian 0.9919097
12 C1 South Asian 0.9918594

Where do you see the highest correlation? At K=3 ancestral populations, the East Asian component is very highly correlated with ASI for the Indian cline populations. Does that mean that we could use that to compute ASI? No, not at all. While it is expected that at K=3, ASI would be a little closer to East Asian than to European, East Asian is not a good proxy for ASI at all since we cannot extrapolate to other individuals and populations.

Related Reading:

Deja Dead: 10th Anniversary Edition (Temperance Brennan Novels)
Así que pasen cinco años. Leyenda del Tiempo (Spanish Edition)
Southeast Asia in World History (New Oxford World History)
Cold Burn (Steve Cline Mysteries)
Rules of Betrayal (Jonathan Ransom, Book 3)

Indian Cline II

One thing I forgot in the post yesterday about the Indian cline was to try to extrapolate from the PCA results to 100% ANI (Ancestral North Indian) and 100% ASI (Ancestral South Indian).

This is a simple linear extrapolation which should be okay since PCA is linear.

The "N" denotes the extrapolated position of ANI and "S" denotes the ASI. The points to the left of "N" are all Utahn Whites while the Onge are on the bottom right of the graph.

As you can see, the ASI is about the same as Onge in terms of eigenvector 1 (which represents the Indian cline approximately), but ASI is far from Onge on the 2nd eigenvector. That is expected since the Onge have been separated from the mainland populations for a long time.

The more interesting thing is that the extrapolated position of ANI is a little to the right of all the Utahn Whites.

We'll need a similar analysis of the Indian cline with more populations to see which one the ANI is closest to.

PS. I should point out that I am using correlation between a limited number of population statistics to find a relationship between the 1st principal component and Reich et al's ASI estimate. This has a number of drawbacks. It would be much better to compute ASI directly.

Related Reading:

A Guide to the Mammals of Southeast Asia
The Egyptian Book of the Dead: The Book of Going Forth by Day - The Complete Papyrus of Ani Featuring Integrated Text and Full-Color Images
Ani's 15-Day Fat Blast: The Kick-Ass Plan to Get Lighter, Tighter, and Sexier . . . Super Fast
The Ultimate Guide to Teaching English in Thailand (Teaching English in Southeast Asia)
At Risk (Steve Cline Mysteries)

Indian Cline

I had used linear regression to estimate Ancestral South Indian (ASI) component from Reference 3 K=11 admixture run. Now here are a couple more exercises along the same lines but much simpler.

Just using the 96 Indian cline samples from Reich et al to compute PCA or admixture doesn't work as the Chenchu separate out in both analyses from the rest. So I added the Utahn White (CEU) samples from HapMap and the Onge from Reich et al.

First, I ran supervised admixture with two ancestral components, Utahn Whites and Onge. Here's the Onge component plotted against Reich et al's ASI estimate along with a linear regression estimate. The correlation between the two is 0.9908.

Second, I ran Principal Component Analysis (PCA) on the Indian cline samples plus Utahn Whites and Onge. Here are the first two PCA dimensions plotted. The first eigenvector explains 4.04% of the total variation and the 2nd explains 1.94%.

The first principal component is mostly along the Indian cline while the second one basically separates the Onge from everyone else.

Using the 1st principal component to estimate ASI, here's the plot with Reich et al's ASI estimate along with a regression line. The correlation between pc1 and ASI is 0.9929.

Note that both these methods work only if the samples are on the Indian cline, i.e., they don't have any other admixture.

And now for comparison, here's the linear regression for the Reference 3 K=11 admixture Onge component and ASI. The correlation here is 0.9949. Note that this is a little different than my previous analysis since I calculated the population averages using only the 96 samples recommended by Reich et al.

Here's a spreadsheet containing the data for these three runs.

There are a couple more tricks I have to figure out some things regarding Ancestral South Indian admixture. Let's hope they provide us some insight.

Related Reading:

Fatal Voyage (Temperance Brennan Novels)
Maha-bharata - The Epic of Ancient India Condensed into English Verse
Dead Man's Touch (Steve Cline Mysteries)
A Forgotten Empire: (Vijayanagar) a Contribution to the History of India
Frommer's Southeast Asia (Frommer's Complete Guides)

Harappa (1-90) K=11 Admixture Ref3

Here's my first admixture run using Reference 3 for Harappa participants. Since K=11 was the run with the Onge-ASI connection, I ran admixture at K=11 with all the 90 Harappa participants.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

Using the comparison between the Onge component here and Reich et al's Ancestral South Indian one, I get the following linear regression.

The correlation is 0.9949 which is probably as high as it can get. So let's calculate the ASI percentage for all the Harappa participants.

Note that I didn't calculate the ASI percentage for those who had a really low Onge component since the linear regression above would not be valid outside the range we have in our original data.

You can see the percentages in a spreadsheet too.

Let's compare with the Dodecad ANI-ASI results. I have 22.5% ASI here while it was 20.6% in the Dodecad analysis. Overall, it seems like my technique results in about 2% more ASI than Dodecad's, with a few exceptions: Like Razib who jumps from 34.3% to 43.3% (averaging his parents who are very close).

Related Reading:

Deep Ancestry: Inside The Genographic Project
Interaction: Langue et culture
Asi Escribimos: A Writing Workbook for Beginning Spanish Students (1st Year)
Alone in an Untamed Land: The Filles Du Roi Diary of Helene St. Onge
The Family Tree Problem Solver: Tried-and-True Tactics for Tracing Elusive Ancestors

Admixture Onge Component Map

Since the Onge component on my K=11 admixture run was very strongly correlated with Reich et al's Ancestral South Indian (r2Simranjit has been kind enough to let me share his map of the Onge component in South Asia.

He also has maps of the K=12 admixture run.

Related Reading:

Somos asi 2
World Executive Poster Sized Wall Map (laminated)
Interaction: Langue et culture
Fidelity of Isopleth Maps: An Experimental Study
MAP: The Co-Creative White Brotherhood Medical Assistance Program

Reference 3 Admixture K=11

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

Reich ASI % Onge Component %
Mala 61.2 39.9
Madiga 59.4 37.9
Chenchu 59.3 38.6
Bhil 57.1 37.5
Satnami 57 36.4
Kurumba 56.8 39.5
Kamsali 55.5 35.5
Vysya 53.8 34.4
Lodi 50.1 31.8
Naidu 49.9 32.1
Tharu 49 32.2
Velama 45.3 28.9
Srivastava 43.6 27.8
Meghawal 39.7 25.4
Vaish 37.4 23.8
Kashmiri-Pandit 29.4 17.6
Sindhi 26.3 13.4
Pathan 23.1 10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C2 0.165
C3 0.121 0.122
C4 0.090 0.161 0.152
C5 0.071 0.152 0.137 0.048
C6 0.134 0.144 0.067 0.163 0.143
C7 0.184 0.224 0.216 0.179 0.186 0.232
C8 0.210 0.209 0.205 0.235 0.223 0.228 0.286
C9 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.290
C10 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364
C11 0.150 0.195 0.187 0.143 0.148 0.203 0.059 0.260 0.252 0.133

Related Reading:

Merriam-Webster's Everyday Language Reference Set
Study Bible KJV - Scofield Reference Bible
Interaction: Langue et culture
The Foolish Dictionary An exhausting work of reference to un-certain English words, their origin, meaning, legitimate and illegitimate use, confused by a few pictures [not included]

Dienekes on ANI/ASI

Dienekes has a word of caution about choosing reference populations and admixture results.

Consider a sample of 25 Mexicans from the HapMap and 25 Yoruba from the Hapmap, 25 Iberian Spanish from the 1000 Genomes Project, and 25 Pima from the HGDP as parental populations. We obtain for our Mexican sample:

  • 59.7% European
  • 36.9% "Native American"
  • 3.4% African

Let's run a final experiment with just the Mexicans, Spanish, and Yoruba, i.e., with no Native American samples. At K=3 we obtain:

  • 70% "Native American"
  • 29.7% European
  • 0.4% African

The "Native American" component has increased again! The explanation is simple: as we exclude less admixed Native American groups, Mexicans appear (comparatively) more Native American. The "Native American pole" has shifted, and so has the relative position of populations between them.

In other terms, what is labeled "Native American" in the three experiments is not the same: in the first one it is anchored on the more unadmixed Pima, in the last one in the more admixed Mexicans.

Thus, it seems that unadmixed reference samples are much more useful in getting good results from Admixture.

Then he runs Admixture on the Reich et al dataset for South Asians and tries to estimate the relationship between the Ancestral North Indian percentage computed by Reich et al and his K=2 admixture results on the same data.

Dienekes then included South Asian Dodecad participants in the analysis and ran a K=4 admixture analysis on Reich et al + Dodecad South Asian data, including Yoruba and Beijing Chinese from the HapMap to catch any African or East Asian ancestry.

Here are the admixture results for the reference populations:

The R2 correlation between the West Eurasian admixture component and the Reich et al ANI component is 0.98 which is good. His relationship equation comes out to:

ANI = 0.779*WestEurasian + 39.674

Using this relationship, he calculates the ANI and ASI (Ancestral South Indian) components for Dodecad project members. My results (DOD128) are as follows:

East Eurasian 0.0%
African 3.5%
Ancestral North Indian 75.9%
Ancestral South Indian 20.6%

I should point out that due to my recent Egyptian ancestry, my ANI result is wrong since it's collecting all of the non-African Egyptian in there too.

Also, in the case of Razib, I don't think his East Asian 14.4% should be separated out from his ANI-ASI like that. At least some of it should form part of his ASI percentage in my opinion.

Otherwise, this seems like a very good exercise by Dienekes.

Related Reading:

Beyond Outrage: What has gone wrong with our economy and our democracy, and how to fix them (Kindle Single)
Junkyard Ghost Revival
Spider Bones: A Novel (Temperance Brennan Novels)
Over the Anvil We Stretch
Asi lo veo: Gente, Perspectivas, Comunicación