Tag Archives: dodecad

Dodecad South Asian ChromoPainter

Posted by Zack on March 4, 2012 17 comments

Dienekes ran ChromoPainter/fineSTRUCTURE analysis of South Asians along with some West Eurasian populations, something I had neglected to do in my own South Asian run.

Using Dienekes' data, I was trying to figure out which South Asian populations had more DNA chunks in common with other groups when I ran into something strange. Looking at the chunkcount spreadsheet, if we focus on a recipient population (i.e., one row), we can see which populations contributed more "chunks". For most populations, the results are expected. It's either the same population or some close population. For example, let's look at top 5 matches for Velamas_M,

	Velamas_M	Pulliyar_M	North_Kannadi	Chamar_M	Piramalai_Kallars_M
Velamas_M	1265.77	1259.38	1256.06	1255.6	1254.74

However, when we do the same for Pathans, Sindhis, Uttar Pradesh Brahmins, Kshatriyas and Muslims, we get strange results.

	Chamar_M	Velamas_M	UP_Scheduled_Caste_M	Piramalai_Kallars_M	Muslim_M
Pathan	1229.91	1229.56	1229.53	1229.32	1229.27

Do Pathans match Chamar the best? Pathans don't show up as a donor till #11.

	Chamar_M	Piramalai_Kallars_M	Pulliyar_M	Velamas_M	North_Kannadi
Sindhi	1234.09	1234.08	1233.85	1233.6	1233.55

Again, Sindhis as donors are #12.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Brahmins_UP_M	1244.6	1244.53	1243.44	1242.88	1241.94

The same Brahmins_UP_M are #13 as donors.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Kshatriya_M	1247.72	1247.36	1246.42	1244.98	1244.56

And #12.

	Pulliyar_M	Chamar_M	North_Kannadi	Kol_M	Piramalai_Kallars_M
Muslim_M	1255.96	1255.36	1253.96	1251.74	1250.86

Muslim_M are #8 as donors.

There is a pattern here among the top donors for these populations. The same populations show up time and again.

Compare to my results (with a larger South Asian dataset) now. The top 10 matches for Pathans are:

pathan
punjabi-jatt
bhatia
haryana-jatt
rajasthani-brahmin
punjabi
balochi
kashmiri
punjabi-brahmin
sindhi

For Sindhis,

sindhi
bhatia
balochi
makrani
brahui
punjabi-jatt
haryana-jatt
meghawal
pathan
punjabi

For Brahmins from Uttar Pradesh,

bihari-brahmin
haryana-jatt
brahmin-uttar-pradesh
punjabi-jatt
kurmi
sourastrian
bengali-brahmin
bihari-kayastha
bhatia
up-brahmin

For Kshatriyas,

bihari-brahmin
kurmi
meena
kshatriya
rajasthani-brahmin
haryana-jatt
punjabi-jatt
bengali-brahmin
kerala-muslim
sourastrian

For Muslims,

muslim
chamar
kol
oriya
uttar-pradesh-scheduled-caste
bihari-muslim
sourastrian
brahmin-uttaranchal
dusadh
bihari-brahmin

If Dienekes can post a chunkcount file for the clusters computed by fineSTRUCTURE, may be we can try to figure out what happened.

Dienekes on ANI/ASI

Posted by Zack on March 22, 2011 9 comments

Dienekes has a word of caution about choosing reference populations and admixture results.

Consider a sample of 25 Mexicans from the HapMap and 25 Yoruba from the Hapmap, 25 Iberian Spanish from the 1000 Genomes Project, and 25 Pima from the HGDP as parental populations. We obtain for our Mexican sample:

59.7% European

36.9% "Native American"

3.4% African

Let's run a final experiment with just the Mexicans, Spanish, and Yoruba, i.e., with no Native American samples. At K=3 we obtain:

70% "Native American"

29.7% European

0.4% African

The "Native American" component has increased again! The explanation is simple: as we exclude less admixed Native American groups, Mexicans appear (comparatively) more Native American. The "Native American pole" has shifted, and so has the relative position of populations between them.

In other terms, what is labeled "Native American" in the three experiments is not the same: in the first one it is anchored on the more unadmixed Pima, in the last one in the more admixed Mexicans.

Thus, it seems that unadmixed reference samples are much more useful in getting good results from Admixture.

Then he runs Admixture on the Reich et al dataset for South Asians and tries to estimate the relationship between the Ancestral North Indian percentage computed by Reich et al and his K=2 admixture results on the same data.

Dienekes then included South Asian Dodecad participants in the analysis and ran a K=4 admixture analysis on Reich et al + Dodecad South Asian data, including Yoruba and Beijing Chinese from the HapMap to catch any African or East Asian ancestry.

Here are the admixture results for the reference populations:

The R² correlation between the West Eurasian admixture component and the Reich et al ANI component is 0.98 which is good. His relationship equation comes out to:

ANI = 0.779*WestEurasian + 39.674

Using this relationship, he calculates the ANI and ASI (Ancestral South Indian) components for Dodecad project members. My results (DOD128) are as follows:

East Eurasian	0.0%
African	3.5%
Ancestral North Indian	75.9%
Ancestral South Indian	20.6%

I should point out that due to my recent Egyptian ancestry, my ANI result is wrong since it's collecting all of the non-African Egyptian in there too.

Also, in the case of Razib, I don't think his East Asian 14.4% should be separated out from his ANI-ASI like that. At least some of it should form part of his ASI percentage in my opinion.

Otherwise, this seems like a very good exercise by Dienekes.

Dodecad vs Harappa

Posted by Zack on February 18, 2011 9 comments

We know that some participants in Harappa Ancestry Project had also submitted their data to Dodecad Project. And they were curious how the different ancestry components here lined up with the Dodecad ones.

So I decided to compare the two. I took the ancestral component percentages for the reference populations from
Dodecad population spreadsheet K=10 and Harappa Reference I spreadsheet K=9.

I selected the 36 populations that are present in both. While some of these are still not comparable because of which samples out of these populations were selected to be included in the reference datasets for Dodecad and Harappa, we are using mean values, so barring any big outliers we can compare them.

I decided to find a solution to linear equations of the form:

C1 = a₁₁*D1 + a₁₂*D2 + a₁₃*D3 + a₁₄*D4 + a₁₅*D5 + a₁₆*D6 + a₁₇*D7 + a₁₈*D8 + a₁₉*D9 + a_1A*D10
C2 = a₂₁*D1 + a₂₂*D2 + a₂₃*D3 + a₂₄*D4 + a₂₅*D5 + a₂₆*D6 + a₂₇*D7 + a₂₈*D8 + a₂₉*D9 + a_2A*D10
C3 = a₃₁*D1 + a₃₂*D2 + a₃₃*D3 + a₃₄*D4 + a₃₅*D5 + a₃₆*D6 + a₃₇*D7 + a₃₈*D8 + a₃₉*D9 + a_3A*D10
C4 = a₄₁*D1 + a₄₂*D2 + a₄₃*D3 + a₄₄*D4 + a₄₅*D5 + a₄₆*D6 + a₄₇*D7 + a₄₈*D8 + a₄₉*D9 + a_4A*D10
C5 = a₅₁*D1 + a₅₂*D2 + a₅₃*D3 + a₅₄*D4 + a₅₅*D5 + a₅₆*D6 + a₅₇*D7 + a₅₈*D8 + a₅₉*D9 + a_5A*D10
C6 = a₆₁*D1 + a₆₂*D2 + a₆₃*D3 + a₆₄*D4 + a₆₅*D5 + a₆₆*D6 + a₆₇*D7 + a₆₈*D8 + a₆₉*D9 + a_6A*D10
C7 = a₇₁*D1 + a₇₂*D2 + a₇₃*D3 + a₇₄*D4 + a₇₅*D5 + a₇₆*D6 + a₇₇*D7 + a₇₈*D8 + a₇₉*D9 + a_7A*D10
C8 = a₈₁*D1 + a₈₂*D2 + a₈₃*D3 + a₈₄*D4 + a₈₅*D5 + a₈₆*D6 + a₈₇*D7 + a₈₈*D8 + a₈₉*D9 + a_8A*D10
C9 = a₉₁*D1 + a₉₂*D2 + a₉₃*D3 + a₉₄*D4 + a₉₅*D5 + a₉₆*D6 + a₉₇*D7 + a₉₈*D8 + a₉₉*D9 + a_9A*D10

For each of the 36 populations, we'll have these 9 equations where C1 through C9 are the ancestral component percentages of that population in Harappa Project and D1 through D10 are the ancestral percentages in Dodecad Project.

The unknowns are the coefficients "a". They are 90 unknowns. Since we have 36 populations, the number of equations is 36*9=324. Therefore, this is an overdetermined system of linear equations and we can find a least squares solution to it.

Here is the solution:

	D1 W Asian	D2 NW African	D3 S Euro	D4 NE Asian	D5 SW Asian	D6 E Asian	D7 N Euro	D8 W African	D9 E African	D10 S Asian
C1 S Asian	0	0	0	0	0	0	0	0	0	0.92
C2 Kalash	0.54	0	-0.05	0.12	0.07	0	0.2	0	0	0.1
C3 SW Asian	0.46	0.56	0.44	0	0.9	0	-0.09	0	0.09	-0.07
C4 SE Asian	0	0	0	0	0	0.6	0	0	0	0
C5 Euro	0	0.19	0.6	0.05	-0.05	0	0.88	0	0	0
C6 Papuan	0	0	0	0	0	0	0	0	0	0
C7 NE Asian	0	0	0	0.85	0	0.4	0	0	0	0
C8 W African	0	0.12	0	0	0	0	0	1	0	0
C9 E African	0	0.12	0	0	0.05	0	0	0	0.89	0

Don't take the exact values to heart but this shows the general relationship between the Dodecad and Harappa (K=9) ancestral components.

The South Asian components are about the same in both projects.

The Kalash component is a mix but is primarily Dodecad West Asian.

The Harappa Southwest Asian has contributions from Northwest African, West Asian and South European in addition to the Dodecad West Asian component.

The Southeast Asian component corresponds partially to the Dodecad East Asian component.

The Harappa European component is more Dodecad North European than South European.

If enough Harappa-Dodecad participants are willing to let me know their IDs for both projects, I can do a similar analysis using individual data.

Harappa Ancestry Project

Genetics and South Asia

Tag Archives: dodecad

Dodecad South Asian ChromoPainter

Dienekes on ANI/ASI

Dodecad vs Harappa

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Tag Archives: dodecad

Dodecad South Asian ChromoPainter

Share this:

Dienekes on ANI/ASI

Share this:

Dodecad vs Harappa

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll