Tag Archives: harappa - Page 2

Harappa Oracle

Posted by Zack on March 23, 2012 15 comments

Based on the Dodecad Oracle, here is Harappa Oracle using reference 3 admixture results.

I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components. Because of that it is several times slower than DodecadOracle. I plan to offer an option soon to switch between Euclidean distance and Fst-weighted distance.

You need to install R to use it. Then unzip the Oracle zip file. Double-click on the file or use the following in R:

load('HarappaOracleR3fst.RData')

In R, you can look at the 385 populations included by typing:

X[,1]

To use it to find your closest populations, you need your Harappa Reference 3 admixture results. Use them separated by commas like this (for me):

HarappaOracle(c(44,12,0,24,14,1,2,0,0,1,2))

You will get a result, with the first column showing the closest populations and the 2nd column their distance to you.

[,1] [,2]
[1,] "balochi" "8.0242"
[2,] "bene-israel" "9.2843"
[3,] "brahui" "9.5158"
[4,] "pathan" "9.7034"
[5,] "makrani" "10.1014"
[6,] "sindhi" "10.9236"
[7,] "Bhatia" "11.8441"
[8,] "Sindhi" "12.1704"
[9,] "Kashmiri" "13.4229"
[10,] "punjabi-arain" "13.9192"

You can also find out the closest populations to one of the reference populations:

HarappaOracle("punjabi-arain")

By default, the Oracle shows the 10 closest populations. You can change that:

HarappaOracle("punjabi-arain",k=20)

Also, by default, the Oracle excludes the Pan-Asian dataset since the overlap is only 5,400 SNPs. You can include Pan-Asian populations:

HarappaOracle("punjabi-arain",panasian=T)

There is also a mixed mode where the individual (or mean reference population) is compared against all pairs of populations as ancestors.

HarappaOracle("Haryana Jatt",mixedmode=T)

which has the following output:

[1,] "Haryana Jatt" "0"
[2,] "15.4% lithuanians + 84.6% Punjabi Brahmin" "1.9553"
[3,] "10.6% russian + 89.4% Rajasthani Brahmin" "2.0626"
[4,] "14.7% finnish + 85.3% Punjabi Brahmin" "2.0863"
[5,] "9.2% finnish + 90.8% Rajasthani Brahmin" "2.1142"
[6,] "89.4% Rajasthani Brahmin + 10.6% mordovians" "2.1727"
[7,] "9.6% lithuanians + 90.4% Rajasthani Brahmin" "2.1989"
[8,] "10.1% belorussian + 89.9% Rajasthani Brahmin" "2.2938"
[9,] "16.8% russian + 83.2% Punjabi Brahmin" "2.3015"
[10,] "16.2% belorussian + 83.8% Punjabi Brahmin" "2.3656"

You can of course combine any or all of the options.

Think of Harappa Oracle as a tool to help you interpret your admixture results by comparing who you are closest to. Do not think of it as giving you your real ancestry.

Harappa Participant Admixture Group Averages

Posted by Zack on February 21, 2012 38 comments

I have been reporting only individual admixture results for Harappa Project participants. I think it's way past time I posted some group averages too.

You can see the groups I have assigned participants and the current count for each group.

The average admixture results for each group are in a spreadsheet. This is using Reference 3. You can compare with the reference population results.

Here's the bar chart for participants group averages. Remember you can click on the legend or the table headers to sort.

Admixture (Ref3 K=11) HRP0211-HRP0220

Posted by Zack on February 13, 2012 20 comments

Here are the admixture results using Reference 3 for Harappa participants HRP0211 to HRP0220.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

Do note that small percentages for your results can be noise.

HRP0211 seems like a typical Tamil Brahmin.

HRP0212 is half-Fijian, half Indian/Pakistani/Afghan. It looks like his Fijian ancestry shows up as Papuan and East Asian mostly.

HRP0213 is a Gujarati Khoja whose results are not just different from the Gujarati Patels (Gujarati A) but also from HRP0130, a Gujarati Ganchi and HapMap Gujarati B.

HRP0216 is an Iraqi Assyrian and is a little more European than the other Assyrians. The Onge, Papuan and American are likely noise.

HRP0217 and HRP0218 are Kazakhs and fairly similar to the other Kazakhs in the project.

This will probably be the last admixture analysis using Reference 3.

Admixture (Ref3 K=11) HRP0201-HRP0210

Posted by Zack on January 17, 2012 1 comment

Here are the admixture results using Reference 3 for Harappa participants HRP0201 to HRP0210.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

Do note that small percentages for your results can be noise.

South Asian PCA 3D Plot

Posted by Zack on January 12, 2012 18 comments

Here's a 3-D plot of my South Asian PCA run, showing the first three principal components.

The principal components have been scaled according to their respective eigenvalues. The plot is rotating about the vertical 1st eigenvector.

</p> <p>Your browser does not support frames. Go <a href="http://www.harappadna.org/wp-content/uploads/2012/01/ref3_c_m_hap_sa.pcaplot.html">here</a> to see the animation.</p> <p>

You can find out your position on the plot by using the dropdown below the plot and selecting your Harappa ID.

South Asian PCA Plots

Posted by Zack on January 10, 2012 3 comments

I did a South Asian PCA + Mclust analysis last month. Here are the PCA plots from that analysis.

First, the eigenvectors are not scaled to the eigenvalues in the plots. So here's a table explaining how much each eigenvector is worth.

Eigenvector	Percentage variation explained
1	1.134%
2	0.452%
3	0.351%
4	0.263%
5	0.254%
6	0.236%
7	0.228%
8	0.224%
9	0.215%
10	0.209%
11	0.207%
12	0.205%
13	0.203%
14	0.201%
15	0.198%
16	0.194%
17	0.191%
18	0.189%
19	0.189%
20	0.188%
21	0.188%
22	0.187%
23	0.186%
24	0.185%
25	0.184%
26	0.184%
27	0.183%
28	0.182%
29	0.180%
30	0.180%
31	0.179%
32	0.179%

Eigenvector 1 looks like the Indian cline but it's actually a West-East Eurasian cline. It's quite similar to Reich et al's Indian cline for their subset of populations (correlation between pc1 and ASI is 0.998869) but since East Asian is not separated out here due to the lack of any East Asian samples, we get a mix of East Asian and Ancestral South Indian towards the right of the plot.

Eigenvector 2 separates Kalash from everyone else.

South Asian PCA + Mclust

Posted by Zack on December 21, 2011 9 comments

I combined reference 3 with Metspalu et al data and Harappa Ancestry Project participants (up to HRP0200). Then I kept only those individuals whose combined proportion of South Asian and Onge components on my reference 3 admixture results was more than 50%.

I ran PCA on these South Asian samples and kept 31 dimensions. Running Mclust on the PCA results gave me 37 clusters.

The clustering results are in a spreadsheet.

For an individual, the value under a specific cluster shows the probability of that person belonging to that cluster. For example, HRP0152 has a 58% probability of belonging to cluster CL8 and 42% probability of being in cluster CL14.

For the populations in the first sheet, I added up the probabilities of all the samples in that population to get the expected number of individuals of that ethnicity belonging to a specific cluster.

In the second sheet, I have listed all the individual samples' clustering results.

There are some outliers who didn't belong in any cluster: HRP0001 (me, of course), 7 (out of 18) Makranis, 4 (out of 23) Sindhis, 3 (all) Great Andamanese, 1 (out of 20) Balochi, 1 (out of 4) Madiga, and 1 (only) Onge.

Reference 3 + Yunusbayev + HAP PCA and Mclust

Posted by Zack on December 19, 2011 8 comments

I ran Principal Component Analysis (PCA) on reference 3 along with Yunusbayev et al Caucasus dataset and Harappa Ancestry Project participants (up to HRP0200).

Then I ran mclust on the first 70 dimensions. The resulting 156 clusters can be seen in a spreadsheet.

For individuals belonging to Harappa Ancestry Project, the value in a column shows that person's probability of being in that cluster. So if there is a 1 in CL15 for example, then that person has a 100% probability of being in Cluster CL15.

For the reference population groups, I have added up the probabilities for all the individuals belonging to that group.

Admixture (Ref3 K=11) HRP0191-HRP0200

Posted by Zack on December 1, 2011 7 comments

Here are the admixture results using Reference 3 for Harappa participants HRP0191 to HRP0200.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

HRP0193 is Georgian and has very similar results to HRP0138 and HRP0175.

HRP0200 is Kazakh and is closely related to HRP0089. Thus the difference there (American, Onge & Papuan components) is somewhat interesting, though not high enough to be certain that it's not noise.

HRP0197 and HRP0198 are Somali. HRP0197 pointed out to me that 14S_R1, a Somali in the reference set, was an outlier who was more like East African Bantu (e.g., Luhya) than the other reference Somalis. So in the table below, I have excluded 14S_R1 for the average.

Component	RefAverage	HRP00197	HRP00198
S Asian	0	2	2
Onge	4	0	1
E Asian	0	1	2
SW Asian	28	33	34
European	0	0	0
Siberian	0	2	1
W African	12	14	13
Papuan	0	0	0
American	0	0	1
San/Pygmy	2	3	2
E African	52	44	43

Interestingly, the two project participants are more Asian than the reference average.

Ref3 + Yunusbayev Harappa Admixture Results

Posted by Zack on November 10, 2011 1 comment

The ADMIXTURE results for the Harappa participants (up to HRP0180) for the Reference 3 + Yunusbayev dataset are in a spreadsheet and can also be seen in the bar charts below.

Do take a look at K=12 and K=17 (lowest crossvalidation errors) as well as K=15.

« Previous page | Next page »

Harappa Ancestry Project

Genetics and South Asia

Tag Archives: harappa - Page 2

Harappa Oracle

Harappa Participant Admixture Group Averages

Admixture (Ref3 K=11) HRP0211-HRP0220

Admixture (Ref3 K=11) HRP0201-HRP0210

South Asian PCA 3D Plot

South Asian PCA Plots

South Asian PCA + Mclust

Reference 3 + Yunusbayev + HAP PCA and Mclust

Admixture (Ref3 K=11) HRP0191-HRP0200

Ref3 + Yunusbayev Harappa Admixture Results

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Tag Archives: harappa - Page 2

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll