Dodecad vs Harappa

Posted by Zack on February 18, 2011

We know that some participants in Harappa Ancestry Project had also submitted their data to Dodecad Project. And they were curious how the different ancestry components here lined up with the Dodecad ones.

So I decided to compare the two. I took the ancestral component percentages for the reference populations from
Dodecad population spreadsheet K=10 and Harappa Reference I spreadsheet K=9.

I selected the 36 populations that are present in both. While some of these are still not comparable because of which samples out of these populations were selected to be included in the reference datasets for Dodecad and Harappa, we are using mean values, so barring any big outliers we can compare them.

I decided to find a solution to linear equations of the form:

C1 = a₁₁*D1 + a₁₂*D2 + a₁₃*D3 + a₁₄*D4 + a₁₅*D5 + a₁₆*D6 + a₁₇*D7 + a₁₈*D8 + a₁₉*D9 + a_1A*D10
C2 = a₂₁*D1 + a₂₂*D2 + a₂₃*D3 + a₂₄*D4 + a₂₅*D5 + a₂₆*D6 + a₂₇*D7 + a₂₈*D8 + a₂₉*D9 + a_2A*D10
C3 = a₃₁*D1 + a₃₂*D2 + a₃₃*D3 + a₃₄*D4 + a₃₅*D5 + a₃₆*D6 + a₃₇*D7 + a₃₈*D8 + a₃₉*D9 + a_3A*D10
C4 = a₄₁*D1 + a₄₂*D2 + a₄₃*D3 + a₄₄*D4 + a₄₅*D5 + a₄₆*D6 + a₄₇*D7 + a₄₈*D8 + a₄₉*D9 + a_4A*D10
C5 = a₅₁*D1 + a₅₂*D2 + a₅₃*D3 + a₅₄*D4 + a₅₅*D5 + a₅₆*D6 + a₅₇*D7 + a₅₈*D8 + a₅₉*D9 + a_5A*D10
C6 = a₆₁*D1 + a₆₂*D2 + a₆₃*D3 + a₆₄*D4 + a₆₅*D5 + a₆₆*D6 + a₆₇*D7 + a₆₈*D8 + a₆₉*D9 + a_6A*D10
C7 = a₇₁*D1 + a₇₂*D2 + a₇₃*D3 + a₇₄*D4 + a₇₅*D5 + a₇₆*D6 + a₇₇*D7 + a₇₈*D8 + a₇₉*D9 + a_7A*D10
C8 = a₈₁*D1 + a₈₂*D2 + a₈₃*D3 + a₈₄*D4 + a₈₅*D5 + a₈₆*D6 + a₈₇*D7 + a₈₈*D8 + a₈₉*D9 + a_8A*D10
C9 = a₉₁*D1 + a₉₂*D2 + a₉₃*D3 + a₉₄*D4 + a₉₅*D5 + a₉₆*D6 + a₉₇*D7 + a₉₈*D8 + a₉₉*D9 + a_9A*D10

For each of the 36 populations, we'll have these 9 equations where C1 through C9 are the ancestral component percentages of that population in Harappa Project and D1 through D10 are the ancestral percentages in Dodecad Project.

The unknowns are the coefficients "a". They are 90 unknowns. Since we have 36 populations, the number of equations is 36*9=324. Therefore, this is an overdetermined system of linear equations and we can find a least squares solution to it.

Here is the solution:

	D1 W Asian	D2 NW African	D3 S Euro	D4 NE Asian	D5 SW Asian	D6 E Asian	D7 N Euro	D8 W African	D9 E African	D10 S Asian
C1 S Asian	0	0	0	0	0	0	0	0	0	0.92
C2 Kalash	0.54	0	-0.05	0.12	0.07	0	0.2	0	0	0.1
C3 SW Asian	0.46	0.56	0.44	0	0.9	0	-0.09	0	0.09	-0.07
C4 SE Asian	0	0	0	0	0	0.6	0	0	0	0
C5 Euro	0	0.19	0.6	0.05	-0.05	0	0.88	0	0	0
C6 Papuan	0	0	0	0	0	0	0	0	0	0
C7 NE Asian	0	0	0	0.85	0	0.4	0	0	0	0
C8 W African	0	0.12	0	0	0	0	0	1	0	0
C9 E African	0	0.12	0	0	0.05	0	0	0	0.89	0

Don't take the exact values to heart but this shows the general relationship between the Dodecad and Harappa (K=9) ancestral components.

The South Asian components are about the same in both projects.

The Kalash component is a mix but is primarily Dodecad West Asian.

The Harappa Southwest Asian has contributions from Northwest African, West Asian and South European in addition to the Dodecad West Asian component.

The Southeast Asian component corresponds partially to the Dodecad East Asian component.

The Harappa European component is more Dodecad North European than South European.

If enough Harappa-Dodecad participants are willing to let me know their IDs for both projects, I can do a similar analysis using individual data.

Admixturedodecad, harappa, reference

← Reference I Admixture Analysis K=10-12

Admixture K=10-12, HRP0001 to HRP0010 →

9 Comments.

Dude February 18, 2011 at 8:18 am

HRP0029/DOD387
razib February 18, 2011 at 10:24 am

HRP002/DOD075
Paul Givargidze February 18, 2011 at 10:59 am

HRP0010/DOD134
sv February 18, 2011 at 12:12 pm

HRP0016/DOD327
RK February 18, 2011 at 1:50 pm

HRP0017/DOD331

Were the coefficients similar across all populations?
- Zack February 19, 2011 at 9:36 pm
  
  No. The error was highest for Harappa European and Southwest Asian components. And the errors were most cases concentrated among specific groups.
SA February 18, 2011 at 10:27 pm

HRP0013/DOD336
Zack February 19, 2011 at 9:54 pm

Thanks, guys, for the IDs. I'll do the analysis sometime in the coming week.
Sid February 21, 2011 at 10:52 am

HRP0024/DOD414

Harappa Ancestry Project

Genetics and South Asia