Tag Archives: onge

Indian Cline III

I have been working on creating 100% ASI (Ancestral South Indian) samples recently. So it was really interesting that Dienekes did similar experiments:

I am going about creating the "pure" allele frequencies somewhat differently, so that would be a useful exercise.

Anyway, I thought you guys would be itching for some new results. So here's a PCA plot:

This used the same Principal Component Analysis as the one here using the 96 Indian Cline samples, Utahn Whites and Onge. However, I projected three extra "populations" on this plot.

These three populations are simulated genetic data of 25 individuals using the allele frequencies from Reference 3 Admixture results.

  1. Onge11 is generated from the Onge (C2) component from K=11 admixture for Reference 3.
  2. SA11 is generated from the South Asian (C1) component from the same K=11 admixture.
  3. SA12 is generated from the South Asian (C1) component from the K=12 admixture.

As you can see, the SA12 population lies between 100% ASI and the Indian Cline samples.

The Onge11 generated samples are a bit beyond 100% ASI on the first principal component, but they are also shifted towards the real Onge on pc2.

Indian Cline

I had used linear regression to estimate Ancestral South Indian (ASI) component from Reference 3 K=11 admixture run. Now here are a couple more exercises along the same lines but much simpler.

Just using the 96 Indian cline samples from Reich et al to compute PCA or admixture doesn't work as the Chenchu separate out in both analyses from the rest. So I added the Utahn White (CEU) samples from HapMap and the Onge from Reich et al.

First, I ran supervised admixture with two ancestral components, Utahn Whites and Onge. Here's the Onge component plotted against Reich et al's ASI estimate along with a linear regression estimate. The correlation between the two is 0.9908.

Second, I ran Principal Component Analysis (PCA) on the Indian cline samples plus Utahn Whites and Onge. Here are the first two PCA dimensions plotted. The first eigenvector explains 4.04% of the total variation and the 2nd explains 1.94%.

The first principal component is mostly along the Indian cline while the second one basically separates the Onge from everyone else.

Using the 1st principal component to estimate ASI, here's the plot with Reich et al's ASI estimate along with a regression line. The correlation between pc1 and ASI is 0.9929.

Note that both these methods work only if the samples are on the Indian cline, i.e., they don't have any other admixture.

And now for comparison, here's the linear regression for the Reference 3 K=11 admixture Onge component and ASI. The correlation here is 0.9949. Note that this is a little different than my previous analysis since I calculated the population averages using only the 96 samples recommended by Reich et al.

Here's a spreadsheet containing the data for these three runs.

There are a couple more tricks I have to figure out some things regarding Ancestral South Indian admixture. Let's hope they provide us some insight.

Harappa (1-90) K=11 Admixture Ref3

Here's my first admixture run using Reference 3 for Harappa participants. Since K=11 was the run with the Onge-ASI connection, I ran admixture at K=11 with all the 90 Harappa participants.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

Using the comparison between the Onge component here and Reich et al's Ancestral South Indian one, I get the following linear regression.

The correlation is 0.9949 which is probably as high as it can get. So let's calculate the ASI percentage for all the Harappa participants.

Note that I didn't calculate the ASI percentage for those who had a really low Onge component since the linear regression above would not be valid outside the range we have in our original data.

You can see the percentages in a spreadsheet too.

Let's compare with the Dodecad ANI-ASI results. I have 22.5% ASI here while it was 20.6% in the Dodecad analysis. Overall, it seems like my technique results in about 2% more ASI than Dodecad's, with a few exceptions: Like Razib who jumps from 34.3% to 43.3% (averaging his parents who are very close).

Admixture Onge Component Map

Since the Onge component on my K=11 admixture run was very strongly correlated with Reich et al's Ancestral South Indian (r2Simranjit has been kind enough to let me share his map of the Onge component in South Asia.

He also has maps of the K=12 admixture run.

Reference 3 Admixture K=11

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

Reich ASI % Onge Component %
Mala 61.2 39.9
Madiga 59.4 37.9
Chenchu 59.3 38.6
Bhil 57.1 37.5
Satnami 57 36.4
Kurumba 56.8 39.5
Kamsali 55.5 35.5
Vysya 53.8 34.4
Lodi 50.1 31.8
Naidu 49.9 32.1
Tharu 49 32.2
Velama 45.3 28.9
Srivastava 43.6 27.8
Meghawal 39.7 25.4
Vaish 37.4 23.8
Kashmiri-Pandit 29.4 17.6
Sindhi 26.3 13.4
Pathan 23.1 10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C2 0.165
C3 0.121 0.122
C4 0.090 0.161 0.152
C5 0.071 0.152 0.137 0.048
C6 0.134 0.144 0.067 0.163 0.143
C7 0.184 0.224 0.216 0.179 0.186 0.232
C8 0.210 0.209 0.205 0.235 0.223 0.228 0.286
C9 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.290
C10 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364
C11 0.150 0.195 0.187 0.143 0.148 0.203 0.059 0.260 0.252 0.133