Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Metspalu et al have a new paper in American Journal of Human Genetics about South Asian genetics. Here's the abstract:

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

I'll have some comments later today.

21 Comments.

  1. An exciting paper with a fairly good sampling from Uttar Pradesh, too. Zack, any idea when you could possibly get hold of the data? In the supplemental data, figure 12, the Uttaranchal Brahmins occupy a very weird position relative to other Indian populations. They don't cluster along the Pakistani-South India basic Indian cline. What's up with that?

    http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929711004885.mmc1.pdf

    • I have the data, thanks to Mait Metspalu.

    • They, as well as other Terai/Nepal brahmans, have Tibetan admixture. I think I have a very slight Tibetan component too due from the Terai.

      • yeah, i assumed it was tibetan.

      • They used merely 1 Uttaranchal Brahmin individual in the study. I certainly wouldn't form conclusions based on n=1. The total East-Eurasian admixture as per Dienekes' analysis on the data is around 27%, which just seems far too high to be plausible, or the norm for all Uttaranchal Brahmins. I know a few Uttaranchal Brahmins, and if phenotype is anything to go by, they do not look particular East-Asian influenced. They tend to have a hilly or pahadi look, morphologically Caucasoid if you will, and are rather light by South-Asian standards. If anyone has a free kit to give out, I can get one of the males tested.

  2. yeah, aside from the UP coverage, the main thing is the haplotype diversity. the ADMIXTURE and PCA we've all seen here, at dodecad, and even if you perform your own runs with less robust data sets. the main concern i have is that they don't address that strongly the reich lab's point that 'pure' ADMIXTURE components could simply be well mixed and old. e.g., reich et al. concluded that the most 'pure' south asians, like south indian tribals, are still an ANI/ASI mix. which makes sense when you note that genes like slc24a5 and r1a1 are still found in these populations (west eurasian tells).

    • Normally for high enough K ASI is 84% of the South Asian component. Can one interpret this to mean that 84% of the South Asian component is ASI-like and the other 16% is West Asian-like? Also what is the prospects of the European component being mixed and by how much, if such is the case?

  3. ASI is 84% of the South Asian component. Can one interpret this to mean that 84% of the South Asian component is ASI-like and the other 16% is West Asian-like? Also what is the prospects of the European component being mixed and by how much, if such is the case?

    no. reich et al. estimate that the highest ASI values in south asia are 65%. the average is closer to ~50%. when you see modal south asian components >80%, or even ~100%, that's probably a "mixed" component which is shaking out as a distinctive element. this isn't totally crazy, if you go far back enough all ancestry is mixed and reticulated. reich et al. have a paper in the pipeline suggesting that europeans too are mixed in the same way as south asians.

    • While I think that the "South Asian" ADMIXTURE component is composite, having an indigenous "Paleoindian" and a non-indigenous "ANI" component (http://dienekes.blogspot.com/2011/03/note-of-caution-on-admixture-estimates.html), I wouldn't bet the farm that the relative contributions of the two are exactly as Reich et al. inferred.

      As evidenced here, Reich et al. were wrong in inferring that the ANI component was European-like rather than Caucasus-like (http://dienekes.blogspot.com/2011/05/beware-of-sample-sizes-why-ancestral.html), probably because of a subtle effect of sample sizes, and given that the Onge sample size is much less than the CEU one used in their inference, I suspect that their "Indian Cline" may be shifted towards the "ANI" side because of sample effects.

      • So Dienekes, are you suggesting that Reich et al. 'ANI' might be overestimated due to uneven sample sizes between CEU and Onge? Can anyone infer ANI independently of Reich et al, preferably with a lower sampled West Asian-like group?

        • Yes, although I would not wager how big the effect is, without actually implementing their method. What I am suggesting is that we should not treat that paper as the final word on the topic; I do think it is an ingenuous attempt to resurrect an ancient population, but it already has one flaw (the assertion that CEU and ANI form a clade relative to Adygei), whereas both myself, HAP, and now Metspalu et al. (2011) agree that ANI is more Caucasus-like and less European-like.

          The actual "ASI genome" can be reconstructed using methods similar to those used by the Taino Project, i.e., stitching together ASI segments, since there are no extant ASI individuals in the world today.

          • I have to tentatively agree with Dienekes. The Reich et al method is ingenious and they are likely in the correct ballpark but I am a little hesitant about their exact numbers.

            It would be great to replicate their results. Only if each day had 48 hours!

          • It would be great to replicate their results.

            Perhaps as a Christmas gift to all your participant? Please :)?

          • Unfortunately (for you), I have vacation plans for Christmas.

          • Ah, no worries. Perhaps as a New Year gift in that case. Happy Holidays!

      • They sampled 9 Ongee from a total of ~90 Ongee.

        Why do think that the Caucasus-like (another nomenclature could be Indus) component is non-indigenous to South Asia? Is it because you are seeing higher diversity for that component in the Caucasus?

  4. A question regarding one of the conclusions of the paper. To quote;

    Mean pairwise FST values29 within and among continental regions (Figure 1) reveal that the South Asian autosomal gene pool falls into a distinct geographic cluster, characterized internally, like other continental regions, by short interpopulation genetic distances (<0.01). At the interregional scale, the South Asian cluster shows somewhat shorter genetic distances with West Eurasian (average FST = 0.042) than with East Asian (average FST = 0.051) populations. Importantly, the Pakistani (Indus Valley) populations differ substantially from most of the Indian populations and show comparably low genetic differentiation (within the FST range of 0.008–0.020) from European, Near Eastern, Caucasian, and Indian populations (Figure 1 and Figures S1 and S11 ). In agreement with previous Y-chromosome studies,41,42 the Brahmin and Kshatriya from Uttar Pradesh stand out by being closer to Pakistani (FST = 0.006 on average) and West Eurasian populations (FST = 0.030) than to other Indian populations (average FSTs 0.017 and 0.046, respectively) from the same geographic area (Figures S1 and S11 ).

    Does this imply that the U.P Brahmins and Kshatriya groups, and in turn the North-West south-Asian groups from Pakistan, are closer to other West-Eurasians than they are to other peninsular Indians (Uttar Pradesh, South India, etc)?

  5. Here is a quotation from the paper.

    "We found no regional diversity differences associated with k5 at K = 8. Thus, regardless of where this component was from (the Caucasus, Near East, Indus Valley, or Central Asia), its spread to other regions must have occurred well before our detection limits at 12,500 years. Accordingly, the introduction of k5 to South Asia cannot be explained by recent gene flow, such as the hypothetical Indo-Aryan migration."

    k5 is the component corresponding to ANI.

    • Plus they also derive support for an ancient presence for k5 from the observed clines:

      "within India, there is only a very weak correlation (r = 0.4) between probability of membership in this cluster and distance from its closest core area in Baluchistan ...
      Instead, a more steady cline (correlation r = 0.7 with distance from Baluchistan) of decrease of probability for ancestry in the k5 light green ancestral population can be observed as one moves from Baluchistan toward north (north Pakistan and Central Asia) and west (Iran, the Caucasus, and, finally, the Near East and Europe). If the k5 light green ancestry component (Figure 2B) originated from a recent gene flow event (for example by a demic diffusion model) with a single center of dispersal where the underlying alleles emerged, then one would expect different levels of associated haplotypic diversity to suggest the point of origin of the migration."

  6. At short few things.
    1. ANI component can't be simply and academically assumed to be a obvious result of the Indo-aryan migration theory's validity as its presence in india is indicated in the research to be atleast 10,000 b.c. old which is LGM.
    2. The paper again supports the "idea" that the indian subcontinent was once a stock of varible and quite ancient communities whom laterly and naturally got admixed which is still in process.
    3. The paper also again signals the major genetic intrusion to eurasia from the subcontinent but not the 'likely' opposite one.