Tag Archives: gujarati

Gujaratis HarappaWorld Admixture

Someone asked for the individual HarappaWorld Admixture results for the Gujarati B from HapMap.

To refresh your memory, the Gujarati B are those individuals who do not form part of the big closely clustered Gujarati cluster.

I decided to include HGDP Sindhis as well as Gujaratis, Rajasthanis, Maharashtrians, etc from Harappa Ancestry Project in the list so the Gujaratis can be compared to the people of neighboring regions.

You can check the spreadsheet too.

UPDATE: I have added the Thathai Bhatia and Halai Bhatia participants that I had forgotten.

June Update

I have a total of 123 participants in the project right now who have sent me their raw data. Six of those have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.Укладка дикого камня

The following groups are represented:

Most are 23andme data while 4 are from FTDNA.

We are getting close to 100 South Asian participants.

April Update

I have a total of 97 participants in the project right now who have sent me their raw data. Six of those have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.http://mountainsphoto.ru

The following groups are represented:

Let's try to get to hundred soon.

And yes, I am accepting FTDNA Family Finder (new Illumina chip) now.

End of March Update

I have a total of 67 participants in the project right now who have sent me their raw data. This is not counting those who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.http://polvam.ru

The following groups are represented:

I need to post analyses of Tamils, Bengalis and Punjabis soon.

HapMap Gujaratis

Razib is wondering what's going on with the HapMap Houston Gujaratis.

As you can see, the Chinese simply do not vary much, and are a tight cluster. But, there is a somewhat equivalent Gujarati cluster too! The HapMap sample was collected from Gujaratis in Houston. To me, it looks like that Houston population can be divided into two groups: one of the tight cluster, and the rest of the population, which is all over the place. [...] What’s more interesting is to try and understanding what’s going on with Houston Gujaratis. Anyone in the audience know?

And his 3-dimensional PCA plot: (Those on the right are Gujaratis)
PCA Plot of Gujaratis and Chinese

So I thought I would share the admixture results for the Gujaratis for K=8. Here's the spreadsheet of the admixture proportions for Gujaratis. And here is the plot:

Gujaratis Admixture K=8

The ancestral components and their statistics are as follows:

Population Range Mean Median
C1 South Asian 64-89% 81.9% 85.8%
C2 West Asian 0-13% 2.3% 1.6%
C3 European 2-22% 7.6% 5.0%
C4 Southeast Asian 0-9% 4.9% 5.0%
C5 Austronesian 1-6% 2.8% 2.9%
C6 Northeast Asian 0-3% 0.4% 0.0%
C7 West African 0-1% 0.0% 0.0%
C8 East African 0-0% 0.0% 0.0%

It looks like a majority of the Gujarati samples have mostly South Asian ancestral component with small amounts of West Asian, European and Southeast Asian, but some Gujarati samples have much larger West Asian and/or European ancestral components.

HapMap

I am using several datasets in the public domain for my reference population samples. HapMap is one of those datasets.

According to its website,

The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation. The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease, and responses to drugs and environmental factors. The information produced by the Project will be made freely available.

In the first phase, it genotyped

30 Yoruba adult-and-both-parents trios from Ibadan, Nigeria, 30 trios of U.S. (Utah) residents of northern and western European ancestry, 44 unrelated individuals from Tokyo, Japan and 45 unrelated Han Chinese individuals from Beijing, China.

In their HapMap phase 3 release #3 (NCBI build 36, dbSNP b126), there are 1,397 samples with about 1,457,897 SNPs each.

I removed related individuals as well as individuals whose genomes were too similar. This left me with a total of 1,149 samples with about 474,606 SNPs in common with 23andme's version 2 data.

Since we are not interested in Native American ancestry, I also removed 58 Mexican samples, thus leaving me with 1,091 samples.

Here are the samples I am using from the HapMap data:

Ethnicity Region Count
African Americans Africa 48
European Americans (Utahns) Europe 111
Han Chinese East Asia 137
US Chinese East Asia 106
Gujaratis South Asia 98
Japanese East Asia 113
Kenyan Luhya East Africa 101
Maasai East Africa 135
Tuscans Europe 102
Yoruba West Africa 140

The region assignments are mine to aid me in the analysis, by including/excluding samples by region or by aggregating results by region to find patterns etc.

It was easiest to use the HapMap data since it's available for download in Plink format.