I got back yesterday. Due to jetlag and things that accumulated during my absence, it'll take me a few days to get back to regular posting here.

I got about 12 data submissions during my vacation. I'll send everyone their Harappa IDs by tomorrow. If you haven't received an ID from me by Thursday morning, drop me an email to remind me.

I have 2 batches of ten to process for Admixture results. I hope to post those by the end of the week but can't make any promises.

If you sent me an email in the last month that required a response, I will try to reply soon. If you haven't heard from me by July 11, please remind me.

Thanks, Razib, for your guest blogging.

Brahui are something old, not new

From Wikipedia:

The ethnonym "Brahui" is a very old term and a purely Dravidian one. The fact that other Dravidian languages only exist further south in India has led to several specualations about the orgins of the Brahui. There are three hypotheses regarding the Brahui that have been proposed by academics. One theory is that the Brahui as a relic population of Dravidians, surrounded by speakers of Indo-Iranian languages, remaining from a time when Dravidian was more widespread. Another theory is that they migrated to Baluchistan from inner India during the early Muslim period of the 13th or 14th centuries. More established theory says the Brahui migrated to Balochistan from central India after 1000 CE. The absence of any older Iranian (Avestan) influence in Brahui supports this hypothesis. The main Iranian contributor to Brahui vocabulary is a western Iranian language like Kurdish.

A lot of ADMIXTURE plots I've seen are more consistent with the first (indigenous) than the latter two (exogenous) models. Here's a result for K = 9 with ~90,000 markers:

Turks and Pathans

Of interest to readers on this weblog: Pathan parahistory.

The Pakistan genome

Every South Asian "Arab" a descendant of Muhammad!

Y chromosomes of self-identified Syeds from the Indian subcontinent show evidence of elevated Arab ancestry but not of a recent common patrilineal origin:

Several cultural or religious groups claim descent from a common ancestor. The extent to which this claimed ancestry is real or socially constructed can be assessed by means of genetic studies. Syed is a common honorific title given to male Muslims belonging to certain families claiming descent from the Prophet Muhammad through his grandsons Hassan and Hussein, who lived 1,400 years ago and were the sons of the Prophet’s daughter Fatima. If all Syeds really are in direct descent from Hassan and Hussein, we would expect the Y chromosomes of Syeds to be less diverse than those of non-Syeds. Outside the Arab world, we would also expect to find that Syeds share Y chromosomes with Arab populations to a greater extent than they do with their non-Syed geographic neighbours. In this study, we found that the Y chromosomes of self-identified Syeds from India and Pakistan are no less diverse than those non-Syeds from the same regions, suggesting that there is no biological basis to the belief that self-identified Syeds in this part of the world share a recent common ancestry. In addition to Syeds, we also considered members of other hereditary Muslim lineages, which either claim descent from the tribe or family of Muhammad or from the residents of Medinah. Here, we found that these lineages showed greater affinity to geographically distant Arab populations, than to their neighbours from the Indian subcontinent, who do not belong to an Islamic honorific lineage.

The results are pretty simple. First:

1) The Syed lineages don't exhibit a "Syed modal haplotype." What you should see is a Syed haplotype of ~50%, and then a range of other lineages which introgressed through people lying about their origins or women being unfaithful to their husbands. Instead there are a wide range of haplotypes. Being Syed is an honorific.

2) I don't think that they really prove higher Arab ancestry as such. They include really diverse populations, from Algerians to Israeli Arabs to Sudanese. The Islamic Honorific Lineages are somewhat closer to these groups, but that could be generic West Asian ancestry. For example, Persian. Or perhaps more African ancestry in cosmopolitan Syed lineages. Or, perhaps Syeds are just former high caste Hindus, who have more West Asian affinities.

Below is the PCA and list of Y chromosomal haplogroups. The paper is free at the link above.

According to Cece Moore of Your Genetic Genealogist, there are 3,400 customers of 23andme who identify as South Asian. Now if 10-20% of them would participate in Harappa Ancestry Project!

Also, Konrad Karczewski and others have created a very interesting personal genome analysis website, Interpretome.


I am going on vacation. Unfortunately, the rush of work before a vacation means that I haven't been able to finish the analysis of ASI (Ancestral South Indian) that I wanted to present before going.

Since the only computer I am taking with me is my daughter's netbook, I don't expect to work much on the Harappa Ancestry Project during that time.

I will have Internet access, so I might write some. However, there will not be any new analyses of participants. Also, I am more likely to post a travelogue on my regular blog.

In the meantime, I have Razib lined up as guest blogger here. Hopefully, you guys will have some good discussions.

I encourage people to keep sending me their 23andme or FTDNA data which I'll analyze as soon as I am back.

I expect to be back working on the project at the start of July.

Participants' Help Needed

A journalist from Times of India has contacted me to talk about the Harappa Ancestry Project. She is interested in talking to some Indian participants about it.

If any of you are interested, let me know in comments or by email and I'll forward your contact information to the journalist.

UPDATE: I have sent the contact info of the six people who volunteered.

Taking Suggestions

What would you want from this project? What sort of analyses would you like me to do?

I know several of you want regional admixture/PCA analyses and those are coming starting next week.

In addition to that, is there something specific you would like to be investigated?

For example, is there some specific supervised admixture you would like me to run? A specific PCA/MDS analysis?

Or do want me to try to synthesize all the results we have gotten into some sort of coherent theory instead of throwing out the numbers like I have been doing?

Harappa Nearest IBS Presentation

Since Dodecad posted nearest IBS (identity by state) neighbors, I have had requests to do the same for Harappa participants.

I have the data ready but I am not sure how to present it. I don't want to post an R object since I suspect most of you don't have it installed.

The idea is to give you a list of your closest IBS neighbors as well as your match percentage with them. How would you present that that for 90 people who might match any of several hundred (thousand?) reference samples too? Give me some ideas.