## Genetic Evidence for Recent Population Mixture in India

Finally the paper I had been waiting for ever since the conference presentations on ANI-ASI admixture dating by Moorjani et al at Reich Lab is out:

Moorjani et al., Genetic Evidence for Recent Population Mixture in India, The American Journal of Human Genetics (2013), http://dx.doi.org/10.1016/j.ajhg.2013.07.006

Here's the abstract:

Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.

In this paper, Moorjani et al calculate ANI (Ancestral North Indian) percentage as:

From Reich et al, they changed the outgroup from Papuan to Yoruba and the ANI clade group from CEU (Utahn Whites) to Georgians. I think both are much better choices. Looking at the D-statistics in Table S2, Georgians are definitely an appropriate choice for forming a clade with ANI.

Another important result from the paper is the difference in the date of admixture for Dravidians (108 generations or 3,132 years) and Indo-Europeans (72 generations = 2,088 years).

Testing for multiple waves of admixture, they find that it is more likely in upper-caste and middle-caste Indo-Europeans and the admixture history of a lot of Indian groups is more complex.

Similar to an earlier conference poster, Reich Lab's Priya Moorjani et al have another poster at SMBE. Here's the abstract:

Estimating a date of mixture of ancestral South Asian populations
Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI mixture occurred 1,200-4,000 years ago. Some mixture may also be olderâ€”beyond the time we can query using admixture linkage disequilibriumâ€”since it is universal throughout the subcontinent: present in every group speaking Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of additional mixture.

I bolded the portion which seems new compared to the previous abstract.

## Genetic Affinities of the Central Indian Tribal Populations

Genetic Affinities of the Central Indian Tribal Populations by Gunjan Sharma, Rakesh Tamang, Ruchira Chaudhary, Vipin Kumar Singh, Anish M. Shah, Sharath Anugula, Deepa Selvi Rani, Alla G. Reddy, Muthukrishnan Eaaswarkhanth, Gyaneshwer Chaubey, Lalji Singh, Kumarasamy Thangaraj:

Background
The central Indian state Madhya Pradesh is often called as â€˜heart of Indiaâ€™ and has always been an important region functioning as a trinexus belt for three major language families (Indo-European, Dravidian and Austroasiatic). There are less detailed genetic studies on the populations inhabited in this region. Therefore, this study is an attempt for extensive characterization of genetic ancestries of three tribal populations, namely; Bharia, Bhil and Sahariya, inhabiting this region using haploid and diploid DNA markers.

Methodology/Principal Findings
Mitochondrial DNA analysis showed high diversity, including some of the older sublineages of M haplogroup and prominent R lineages in all the three tribes. Y-chromosomal biallelic markers revealed high frequency of Austroasiatic-specific M95-O2a haplogroup in Bharia and Sahariya, M82-H1a in Bhil and M17-R1a in Bhil and Sahariya. The results obtained by haploid as well as diploid genetic markers revealed strong genetic affinity of Bharia (a Dravidian speaking tribe) with the Austroasiatic (Munda) group. The gene flow from Austroasiatic group is further confirmed by their Y-STRs haplotype sharing analysis, where we determined their founder haplotype from the North Munda speaking tribe, while, autosomal analysis was largely in concordant with the haploid DNA results.

Conclusions/Significance
Bhil exhibited largely Indo-European specific ancestry, while Sahariya and Bharia showed admixed genetic package of Indo-European and Austroasiatic populations. Hence, in a landscape like India, linguistic label doesn't unequivocally follow the genetic footprints.

Did they seriously use only 48 AIMs (ancestrally informative markers) for their autosomal analysis?

UPDATE: Here is their autosomal analysis using STRUCTURE on 48 AIMs.

Can't say I am impressed. It is very noisy. They have the African component varying from 6.2% to 13.2% in populations that should have none. They also have Bhil at 10.8% East Asian (I got 0%), Sahariya at 15.8% (me at 12%), and Gond at 9.2% (I got 7%).

In short, using 48 AIMs instead of 118,000 SNPs leads to really noisy results.

## Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Metspalu et al have a new paper in American Journal of Human Genetics about South Asian genetics. Here's the abstract:

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

I'll have some comments later today.

Via Razib, here's an interesting abstract from the International Congress of Human GeneticsÂ by David Reich's group:

Estimating a date of mixture of ancestral South Asian populations.

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.

I would be very interested in reading that paper. Also, I wonder how many new samples did they genotype beyond the ones in Reich et al' Reconstructing Indian Population History and if I could get my hands on the new data.

I have a feeling that ANI (Ancestral North Indian) captures a bunch of different migrations and conquests etc, so I am not sure if it can be equated to Indo-European language movement.

I wonder if I can use HAPMIX or StepPCO to get similar admixture dating.