Monthly Archives: March 2012

South Asian fineStructure Ref3 Admixture

Posted by Zack on March 27, 2012 1 comment

I was wondering what the admixture patterns of the clusters fineSTRUCTURE computed were for my South Asian run. So I computed the average admixture for each cluster (total: 89) using reference 3 admixture results.

The default order of the clusters is to keep the closer clusters together.

Harappa Oracle

Posted by Zack on March 23, 2012 15 comments

Based on the Dodecad Oracle, here is Harappa Oracle using reference 3 admixture results.

I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components. Because of that it is several times slower than DodecadOracle. I plan to offer an option soon to switch between Euclidean distance and Fst-weighted distance.

You need to install R to use it. Then unzip the Oracle zip file. Double-click on the file or use the following in R:

load('HarappaOracleR3fst.RData')

In R, you can look at the 385 populations included by typing:

X[,1]

To use it to find your closest populations, you need your Harappa Reference 3 admixture results. Use them separated by commas like this (for me):

HarappaOracle(c(44,12,0,24,14,1,2,0,0,1,2))

You will get a result, with the first column showing the closest populations and the 2nd column their distance to you.

[,1] [,2]
[1,] "balochi" "8.0242"
[2,] "bene-israel" "9.2843"
[3,] "brahui" "9.5158"
[4,] "pathan" "9.7034"
[5,] "makrani" "10.1014"
[6,] "sindhi" "10.9236"
[7,] "Bhatia" "11.8441"
[8,] "Sindhi" "12.1704"
[9,] "Kashmiri" "13.4229"
[10,] "punjabi-arain" "13.9192"

You can also find out the closest populations to one of the reference populations:

HarappaOracle("punjabi-arain")

By default, the Oracle shows the 10 closest populations. You can change that:

HarappaOracle("punjabi-arain",k=20)

Also, by default, the Oracle excludes the Pan-Asian dataset since the overlap is only 5,400 SNPs. You can include Pan-Asian populations:

HarappaOracle("punjabi-arain",panasian=T)

There is also a mixed mode where the individual (or mean reference population) is compared against all pairs of populations as ancestors.

HarappaOracle("Haryana Jatt",mixedmode=T)

which has the following output:

[1,] "Haryana Jatt" "0"
[2,] "15.4% lithuanians + 84.6% Punjabi Brahmin" "1.9553"
[3,] "10.6% russian + 89.4% Rajasthani Brahmin" "2.0626"
[4,] "14.7% finnish + 85.3% Punjabi Brahmin" "2.0863"
[5,] "9.2% finnish + 90.8% Rajasthani Brahmin" "2.1142"
[6,] "89.4% Rajasthani Brahmin + 10.6% mordovians" "2.1727"
[7,] "9.6% lithuanians + 90.4% Rajasthani Brahmin" "2.1989"
[8,] "10.1% belorussian + 89.9% Rajasthani Brahmin" "2.2938"
[9,] "16.8% russian + 83.2% Punjabi Brahmin" "2.3015"
[10,] "16.2% belorussian + 83.8% Punjabi Brahmin" "2.3656"

You can of course combine any or all of the options.

Think of Harappa Oracle as a tool to help you interpret your admixture results by comparing who you are closest to. Do not think of it as giving you your real ancestry.

Ref3 Admixture Dendrograms

Posted by Zack on March 19, 2012 5 comments

I have posted the reference 3 K=11 admixture results for all populations and datasets. Here are the relevant links:

So let's try a dendrogram of all these populations' average admixture results. Instead of using regular Euclidean distance, I used some weighting based on Fst distances between admixture components, very similar to what Palisto did.

Here's a dendrogram of all datasets using complete linkage.

Since the Pan-Asian dataset had only 5,400 SNPs common with reference 3, we need to be careful interpreting the tree above. Just to make sure, here's the dendrogram excluding Pan-Asian populations.

Henn Ref3 K=11 Admixture

Posted by Zack on March 16, 2012 5 comments

There have been two Henn et al papers since I started this project.

Hunter-gatherer genomic diversity suggests a southern African origin for modern humans by Brenna M. Henn, Christopher R. Gignoux, Matthew Jobin, Julie M. Granka, J. M. Macpherson, Jeffrey M. Kidd, Laura RodrÃguez-BotiguÃ©, Sohini Ramachandran, Lawrence Hon, Abra Brisbin, Alice A. Lin, Peter A. Underhill, David Comas, Kenneth K. Kidd, Paul J. Norman, Peter Parham, Carlos D. Bustamante, Joanna L. Mountain, and Marcus W. Feldman
Genomic Ancestry of North Africans Supports Back-to-Africa Migrations by Brenna M. Henn, Laura R. BotiguÃ©, Simon Gravel, Wei Wang, Abra Brisbin, Jake K. Byrnes, Karima Fadhlaoui-Zid, Pierre A. Zalloua, Andres Moreno-Estrada, Jaume Bertranpetit, Carlos D. Bustamante, David Comas

The data for both is available online:

I ran reference 3 K=11 admixture on these datasets using about 48,000 SNPs.

Here is the spreadsheet with the Henn group averages for reference 3 admixture at K=11 ancestral components.

Note that the Sandawe, Hadza and San from Henn2011 were already included in Reference 3 and are not listed here.

Pan-Asian Ref3 K=11 Admixture

Posted by Zack on March 13, 2012 9 comments

The HUGO Pan-Asian dataset covers South and East Asia with the following South Asian populations:

23 Andhra Pradesh & Karnataka
10 Bengali
23 Bhil (Rajasthan)
20 Haryana
23 Kashmir Spiti
12 Marathi
12 Rajasthani
30 Singapore Indian
20 Uttaranchal
13 Uttar Pradesh

Unfortunately, they do not specify ethnic or caste background for most Indian groups. Instead, their focus is on Mongoloid/Caucasoid/Australoid etc.

Also, the SNP overlap with other datasets is really small. Therefore, this reference 3 admixture run was done using only 5,400 SNPs. I recommend a big bucket of salt when interpreting these results.

Here is the spreadsheet with the Pan-Asian group averages for reference 3 admixture at K=11 ancestral components.

Xing Ref3 Admixture South Asians

Posted by Zack on March 10, 2012 44 comments

As per AV's comment, here are the individual results for Xing et al South Asians.

Simonson Tibet Dataset

Posted by Zack on March 7, 2012 6 comments

Recently, I discovered that the paper Genetic Evidence for High-Altitude Adaptation in Tibet by Tatum S. Simonson, Yingzhong Yang, Chad D. Huff, Haixia Yun, Ga Qin, David J. Witherspoon, Zhenzhong Bai, Felipe R. Lorenzo, Jinchuan Xing, Lynn B. Jorde, Josef T. Prchal, RiLi Ge has its genotyping data online.

It contains 31 Tibetans from Madou county in Qinghai province. The chip is Affymetrix and there are 868,146 SNPs, which means it has a good overlap with Reich et al and Xing et al and also with my reference 3.

I ran reference 3 K=11 admixture on this dataset. Here are the individual results:

The average is as follows:

S Asian	E Asian	Siberian
1%	84%	14%

Dodecad South Asian ChromoPainter

Posted by Zack on March 4, 2012 17 comments

Dienekes ran ChromoPainter/fineSTRUCTURE analysis of South Asians along with some West Eurasian populations, something I had neglected to do in my own South Asian run.

Using Dienekes' data, I was trying to figure out which South Asian populations had more DNA chunks in common with other groups when I ran into something strange. Looking at the chunkcount spreadsheet, if we focus on a recipient population (i.e., one row), we can see which populations contributed more "chunks". For most populations, the results are expected. It's either the same population or some close population. For example, let's look at top 5 matches for Velamas_M,

	Velamas_M	Pulliyar_M	North_Kannadi	Chamar_M	Piramalai_Kallars_M
Velamas_M	1265.77	1259.38	1256.06	1255.6	1254.74

However, when we do the same for Pathans, Sindhis, Uttar Pradesh Brahmins, Kshatriyas and Muslims, we get strange results.

	Chamar_M	Velamas_M	UP_Scheduled_Caste_M	Piramalai_Kallars_M	Muslim_M
Pathan	1229.91	1229.56	1229.53	1229.32	1229.27

Do Pathans match Chamar the best? Pathans don't show up as a donor till #11.

	Chamar_M	Piramalai_Kallars_M	Pulliyar_M	Velamas_M	North_Kannadi
Sindhi	1234.09	1234.08	1233.85	1233.6	1233.55

Again, Sindhis as donors are #12.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Brahmins_UP_M	1244.6	1244.53	1243.44	1242.88	1241.94

The same Brahmins_UP_M are #13 as donors.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Kshatriya_M	1247.72	1247.36	1246.42	1244.98	1244.56

And #12.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Muslim_M	1255.96	1255.36	1253.96	1251.74	1250.86

Muslim_M are #8 as donors.

There is a pattern here among the top donors for these populations. The same populations show up time and again.

Compare to my results (with a larger South Asian dataset) now. The top 10 matches for Pathans are:

pathan
punjabi-jatt
bhatia
haryana-jatt
rajasthani-brahmin
punjabi
balochi
kashmiri
punjabi-brahmin
sindhi

For Sindhis,

sindhi
bhatia
balochi
makrani
brahui
punjabi-jatt
haryana-jatt
meghawal
pathan
punjabi

For Brahmins from Uttar Pradesh,

bihari-brahmin
haryana-jatt
brahmin-uttar-pradesh
punjabi-jatt
kurmi
sourastrian
bengali-brahmin
bihari-kayastha
bhatia
up-brahmin

For Kshatriyas,

bihari-brahmin
kurmi
meena
kshatriya
rajasthani-brahmin
haryana-jatt
punjabi-jatt
bengali-brahmin
kerala-muslim
sourastrian

For Muslims,

muslim
chamar
kol
oriya
uttar-pradesh-scheduled-caste
bihari-muslim
sourastrian
brahmin-uttaranchal
dusadh
bihari-brahmin

If Dienekes can post a chunkcount file for the clusters computed by fineSTRUCTURE, may be we can try to figure out what happened.

Genetic Affinities of the Central Indian Tribal Populations

Posted by Zack on March 1, 2012 2 comments

Genetic Affinities of the Central Indian Tribal Populations by Gunjan Sharma, Rakesh Tamang, Ruchira Chaudhary, Vipin Kumar Singh, Anish M. Shah, Sharath Anugula, Deepa Selvi Rani, Alla G. Reddy, Muthukrishnan Eaaswarkhanth, Gyaneshwer Chaubey, Lalji Singh, Kumarasamy Thangaraj:

Background
The central Indian state Madhya Pradesh is often called as â€˜heart of Indiaâ€™ and has always been an important region functioning as a trinexus belt for three major language families (Indo-European, Dravidian and Austroasiatic). There are less detailed genetic studies on the populations inhabited in this region. Therefore, this study is an attempt for extensive characterization of genetic ancestries of three tribal populations, namely; Bharia, Bhil and Sahariya, inhabiting this region using haploid and diploid DNA markers.

Methodology/Principal Findings
Mitochondrial DNA analysis showed high diversity, including some of the older sublineages of M haplogroup and prominent R lineages in all the three tribes. Y-chromosomal biallelic markers revealed high frequency of Austroasiatic-specific M95-O2a haplogroup in Bharia and Sahariya, M82-H1a in Bhil and M17-R1a in Bhil and Sahariya. The results obtained by haploid as well as diploid genetic markers revealed strong genetic affinity of Bharia (a Dravidian speaking tribe) with the Austroasiatic (Munda) group. The gene flow from Austroasiatic group is further confirmed by their Y-STRs haplotype sharing analysis, where we determined their founder haplotype from the North Munda speaking tribe, while, autosomal analysis was largely in concordant with the haploid DNA results.

Conclusions/Significance
Bhil exhibited largely Indo-European specific ancestry, while Sahariya and Bharia showed admixed genetic package of Indo-European and Austroasiatic populations. Hence, in a landscape like India, linguistic label doesn't unequivocally follow the genetic footprints.

Did they seriously use only 48 AIMs (ancestrally informative markers) for their autosomal analysis?

UPDATE: Here is their autosomal analysis using STRUCTURE on 48 AIMs.

Can't say I am impressed. It is very noisy. They have the African component varying from 6.2% to 13.2% in populations that should have none. They also have Bhil at 10.8% East Asian (I got 0%), Sahariya at 15.8% (me at 12%), and Gond at 9.2% (I got 7%).

In short, using 48 AIMs instead of 118,000 SNPs leads to really noisy results.

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: March 2012

South Asian fineStructure Ref3 Admixture

Harappa Oracle

Ref3 Admixture Dendrograms

Henn Ref3 K=11 Admixture

Pan-Asian Ref3 K=11 Admixture

Xing Ref3 Admixture South Asians

Simonson Tibet Dataset

Dodecad South Asian ChromoPainter

Genetic Affinities of the Central Indian Tribal Populations

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Monthly Archives: March 2012

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll