Monthly Archives: June 2011

Changes to 1000 Genomes South Asians

Looks like there have been some changes to the populations in the 1000 Genomes:

At least we'll be able to answer questions about the origin of the Sinhalese soon enough. I'm a little bummed that the Indian populations in Maharashtra and West Bengal disappeared. Did the Permit Raj strike again?

Every South Asian "Arab" a descendant of Muhammad!

Y chromosomes of self-identified Syeds from the Indian subcontinent show evidence of elevated Arab ancestry but not of a recent common patrilineal origin:

Several cultural or religious groups claim descent from a common ancestor. The extent to which this claimed ancestry is real or socially constructed can be assessed by means of genetic studies. Syed is a common honorific title given to male Muslims belonging to certain families claiming descent from the Prophet Muhammad through his grandsons Hassan and Hussein, who lived 1,400 years ago and were the sons of the Prophet’s daughter Fatima. If all Syeds really are in direct descent from Hassan and Hussein, we would expect the Y chromosomes of Syeds to be less diverse than those of non-Syeds. Outside the Arab world, we would also expect to find that Syeds share Y chromosomes with Arab populations to a greater extent than they do with their non-Syed geographic neighbours. In this study, we found that the Y chromosomes of self-identified Syeds from India and Pakistan are no less diverse than those non-Syeds from the same regions, suggesting that there is no biological basis to the belief that self-identified Syeds in this part of the world share a recent common ancestry. In addition to Syeds, we also considered members of other hereditary Muslim lineages, which either claim descent from the tribe or family of Muhammad or from the residents of Medinah. Here, we found that these lineages showed greater affinity to geographically distant Arab populations, than to their neighbours from the Indian subcontinent, who do not belong to an Islamic honorific lineage.

The results are pretty simple. First:

1) The Syed lineages don't exhibit a "Syed modal haplotype." What you should see is a Syed haplotype of ~50%, and then a range of other lineages which introgressed through people lying about their origins or women being unfaithful to their husbands. Instead there are a wide range of haplotypes. Being Syed is an honorific.

2) I don't think that they really prove higher Arab ancestry as such. They include really diverse populations, from Algerians to Israeli Arabs to Sudanese. The Islamic Honorific Lineages are somewhat closer to these groups, but that could be generic West Asian ancestry. For example, Persian. Or perhaps more African ancestry in cosmopolitan Syed lineages. Or, perhaps Syeds are just former high caste Hindus, who have more West Asian affinities.

Below is the PCA and list of Y chromosomal haplogroups. The paper is free at the link above.

Read more »

Of sensible sematics

One of my met peeves is the confusion which some ethno-linguistic terms can cause. For example, the fact that there were Iranian language speakers on the plains of Ukraine ~2,000 years ago naturally indicates to people that Scythian nomads issued out of Iran northwards. Similarly, the existence of Indo-Aryan Mitanni in what is today Syria also suggests to people that there was a migration of Indians which traversed much of West Asia in a drive toward the Mediterranean from the Indus. Of course our perception of the center of gravity of these ethno-linguistic groups today is a function of historical contingency. If we didn't know much more about Antique and Medieval European history we might posit that the Celtic Galatians of ancient Anatolia were originally from Ireland, based on the contemporary distribution of Celtic languages!

This issue is now cropping up South Asian archaeogenetics. In my opinion the paper Reconstructing Indian population history is probably the most important contribution to the field in a generation. The authors explain technically why a "South Asian" ancestral component falls out of ancestry inference algorithms at the heart of ADMIXTURE, STRUCTURE, or frappe. In short, when you have a population which is a hybrid, but where the hybridization event is very distant in the past, recombination breaks up the signatures of that event (a decay of the linkage disequilibrium between two putative ancestral populations). Additionally, in the Indian case there doesn't seem to be a "pure" population of one of the two ancestral groups, what they termed "Ancestral South Indians" (ASI). The closest reference they found were Onge Andaman Islanders, whose last common ancestors with ASI was on the order of tens of thousands of years in the past. They do have excellent proxies for the other population, "Ancestral North Indians" (ANI). Compared to ASI all West Eurasians can be used as reasonable proxies for ANI.

Read more »

Of literality and metaphor in the war between Arya and Dasa

Over at Brown Pundits Zach Latif brings up the point that the Indian bias for light skin may date back to the Aryans. And it does seem that such a bias manifests in the earliest texts. But as someone not able to read the original languages I can ascertain the arguments as the import of these passages only at a remove, second and third hand. Some scholars have suggested that the racialized interpretations of the treatments of the interactions between the Aryans and the natives of India are just a look at the past through the lens of the present. They are argue that color terms in the Vedas are metaphors. In contrast, there are others who seem to be arguing for a more straightforward and "literal" reading of the source text. As a non-philologist there's little I can add. But I didn't dismiss those who argued for a metaphorical reading because I know from the literature in the area of Biblical scholarship that straightforward and "literal" readings are quite often deceptive and require their own interpretation (it is difficult to transfer idioms and metaphors properly across languages, and the translators are tempted to render them in the most congenial manner to their own broader theses).

To be frank the information being uncovered by Zack and others makes me think that there was a racial aspect to these conflicts; that the literal reading has some truth. The tribal folk of India are genetically distinct from the caste populations, especially the higher castes. Though all South Asians are a mix to varying degrees of an exogenous West Eurasian element and a South Eurasian indigenous component, the non-Austro-Asiatic tribal populations seem to be a relatively simple combination. The genetic complexity of the structure of other groups suggests to me that there were several later West Eurasian intrusions after the arrival of the "Ancestral North Indians" (ANI), and their hybridization event with the "Ancestral South Indians" (ASI). Racialized language in the older Hindu scriptures may then be conceived of as a conflict between the latest arrivals and the older long established groups which were a stabilized ANI-ASI compound.

One can imagine that this process recapitulated itself in the late medieval and early modern period with the arrival of the Muslims. The ruling Islamic elites who were of Persian and Turk stock viewed the native Hindus through a racialized lens, and were at pains (and to some extent still are!) to distinguish between Muslims of foreign provenance who were "white" and converts from the native populations who were "black." The physical differences are evident when you compare the Emperor Akbar with his grandson Shah Jahan, whose other three grandparents were Rajputs.

Links

According to Cece Moore of Your Genetic Genealogist, there are 3,400 customers of 23andme who identify as South Asian. Now if 10-20% of them would participate in Harappa Ancestry Project!

Also, Konrad Karczewski and others have created a very interesting personal genome analysis website, Interpretome.

Caste is not ancestrally arbitrary

First, thanks to Zack for the opportunity to blog here. More importantly, thanks to Zack for the Harappa Ancestry Project! I've learned a lot from him in terms of the optimal way to go about "genome blogging," and have been able to benefit from his experiences in my own African Ancestry Project. It's really great that in 2011 we don't have to wait for academic researchers to explore the topics which interest us at the intersection of genetics and history.

Prior to being interested in South Asian genetics on such a fine-grained level I had read works such as Nicholas B. Dirks' Castes of Mind. To give you a sense of Dirks' argument, here's the summary from Library Journal:

Is India's caste system the remnant of ancient India's social practices or the result of the historical relationship between India and British colonial rule? Dirks (history and anthropology, Columbia Univ.) elects to support the latter view. Adhering to the school of Orientalist thought promulgated by Edward Said and Bernard Cohn, Dirks argues that British colonial control of India for 200 years pivoted on its manipulation of the caste system. He hypothesizes that caste was used to organize India's diverse social groups for the benefit of British control. His thesis embraces substantial and powerfully argued evidence. It suffers, however, from its restricted focus to mainly southern India and its near polemic and obsessive assertions. Authors with differing views on India's ethnology suffer near-peremptory dismissal....

One of the inferences which people draw from this model, perhaps unfairly, is that the endogamy and biological separation of caste groups is relatively new, and that genetic variation is likely to be arbitrarily distributed across caste groups. The most extreme interpretations almost seem to turn the British into the culture-creators of all that is Indian. In any case, genetics can obviously test the power of this thesis in relation to ancestry.

First up, below I have taken all the HAP samples where N >= 2. I've done some semantic shifting, so that "Tamil Iyer" becomes "Tamil Brahmin." I know that some of you have more information about the samples than is listed in Zack's spreadsheet, but I've been conservative. I will also use the word "community" sometimes instead of "caste" in future posts, because I don't know what the proper word for Syrian Christians or Bihari Muslims would be. But really same difference to me. I want to focus on groups with caste/religious labels intersected with a specific region here. The bar plot below is not going to be a surprise, and you see the clusters in Zack's dendograms, but I thought it would still be useful.

Read more »

Vacation

I am going on vacation. Unfortunately, the rush of work before a vacation means that I haven't been able to finish the analysis of ASI (Ancestral South Indian) that I wanted to present before going.

Since the only computer I am taking with me is my daughter's netbook, I don't expect to work much on the Harappa Ancestry Project during that time.

I will have Internet access, so I might write some. However, there will not be any new analyses of participants. Also, I am more likely to post a travelogue on my regular blog.

In the meantime, I have Razib lined up as guest blogger here. Hopefully, you guys will have some good discussions.

I encourage people to keep sending me their 23andme or FTDNA data which I'll analyze as soon as I am back.

I expect to be back working on the project at the start of July.

Reference 3 + HAP PCA

I had run PCA on the Reference 3 dataset before. Now I included Harappa participants in it as well.

Here's the dendrogram based on the Euclidean distance between Harappa participants in their PCA results.

Since no one liked the IBS nearest neighbors lists, I thought making a spreadsheet with every participants' closest 100 neighbors in PCA space might be more fruitful.

Note that for the Harappa participants, the median distance to their nearest neighbor is 0.2064. So if your nearest neighbor is more than let's say 0.3 away, then you are not close to anyone.

June Update

I have a total of 123 participants in the project right now who have sent me their raw data. Six of those have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.

The following groups are represented:

  • South Asian: 90
    • Tamil: 15
    • Punjab: 13
    • Bengal: 9
    • Karnataka: 7
    • Andhra Pradesh: 5
    • Uttar Pradesh: 5
    • Kerala: 5
    • Bihar: 5
    • Gujarati: 4
    • Sindhi: 4
    • Maharashtra: 3
    • Sri Lankan: 3
    • Caribbean Indian: 2
    • Kashmir: 2
    • Romani: 2
    • Goa: 1
    • Rajasthan: 1
    • Baloch: 1
    • Orissa: 1
    • Anglo-Indian: 1
    • Unknown: 1
  • Others: 33
    • Iran: 8
    • Assyrian: 3
    • Kurd: 2
    • Mexican: 2
    • Ashkenazi: 2
    • Northwest European: 2
    • Iraqi Arab: 2
    • Georgian: 1
    • Azeri: 1
    • Kazakh: 1
    • Brazilian: 1
    • Yemen: 1
    • Irish: 1
    • Egypt: 1
    • Gagauz Turk: 1
    • Afro-Belizean: 1
    • Iraqi Mandaean: 1
    • Egyptian/Iraqi Jew: 1
    • French/Madagascar/Indian: 1

Most are 23andme data while 4 are from FTDNA.

We are getting close to 100 South Asian participants.

Admixture (Ref3 K=11) HRP0111-HRP0120

Here are the admixture results using Reference 3 for Harappa participants HRP0111 to HRP0120.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.