Monthly Archives: December 2011

South Asian PCA + Mclust

Posted by Zack on December 21, 2011 9 comments

I combined reference 3 with Metspalu et al data and Harappa Ancestry Project participants (up to HRP0200). Then I kept only those individuals whose combined proportion of South Asian and Onge components on my reference 3 admixture results was more than 50%.

I ran PCA on these South Asian samples and kept 31 dimensions. Running Mclust on the PCA results gave me 37 clusters.

The clustering results are in a spreadsheet.

For an individual, the value under a specific cluster shows the probability of that person belonging to that cluster. For example, HRP0152 has a 58% probability of belonging to cluster CL8 and 42% probability of being in cluster CL14.

For the populations in the first sheet, I added up the probabilities of all the samples in that population to get the expected number of individuals of that ethnicity belonging to a specific cluster.

In the second sheet, I have listed all the individual samples' clustering results.

There are some outliers who didn't belong in any cluster: HRP0001 (me, of course), 7 (out of 18) Makranis, 4 (out of 23) Sindhis, 3 (all) Great Andamanese, 1 (out of 20) Balochi, 1 (out of 4) Madiga, and 1 (only) Onge.

Reference 3 + Yunusbayev + HAP PCA and Mclust

Posted by Zack on December 19, 2011 8 comments

I ran Principal Component Analysis (PCA) on reference 3 along with Yunusbayev et al Caucasus dataset and Harappa Ancestry Project participants (up to HRP0200).

Then I ran mclust on the first 70 dimensions. The resulting 156 clusters can be seen in a spreadsheet.

For individuals belonging to Harappa Ancestry Project, the value in a column shows that person's probability of being in that cluster. So if there is a 1 in CL15 for example, then that person has a 100% probability of being in Cluster CL15.

For the reference population groups, I have added up the probabilities for all the individuals belonging to that group.

Yunusbayev Ref3 Admixture Results

Posted by Zack on December 16, 2011 3 comments

I ran supervised admixture on the Yunusbayev et al dataset from the Caucasus using my reference 3 data to see how the Yunusbayev samples looked in my Ref3 admixture component space.

Here's the spreadsheet for Yunusbayev admixture results. You can compare with the reference 3 results.

Here's our bar chart for Yunusbayev results. Remember you can click on the legend or the table headers to sort.

Another 23andme Sale

Posted by Zack on December 13, 2011 1 comment

23andme is having another sale till December 31: $23 off per kit (from $99 up front). The code to take advantage of the sale price is TPHG6P.

UPDATE: Here is another link for a $23 discount for 23andme.

Metspalu Ref3 Admixture Results

Posted by Zack on December 12, 2011 22 comments

I ran supervised admixture on the Metspalu et al dataset using my reference 3 data. Here's the spreadsheet for Metspalu admixture results. You can compare with the reference 3 results.

Here's our bar chart for Metspalu results. Remember you can click on the legend or the table headers to sort.

~~These are very different from Dienekes for some reason.~~

UPDATE (Dec 13 10:04am): I found a major error. I had used the population info file I had downloaded from the paper instead of my reformatted one and thus I had not merged that info with the correct IDs with the admixture results. So the previously posted results were junk. I have fixed that now and the results are as expected.

Metspalu et al Data Relatedness

Posted by Zack on December 11, 2011 9 comments

I performed IBD analysis on the Metspalu dataset using plink and found the relatedness of the following samples to be too high.

ID1	Source1	Population1	ID2	Source2	Population2	IBD Estimate
Mawasi1	Metspalu	Mawasi	Mawasi1	Chaubey	Mawasi	100%
VELZ260	Metspalu	Velama	Velama_184_R2	Reich	Velama	99%
VELZ260	Metspalu	Velama	VELZ265	Metspalu	Velama	19%
VELZ265	Metspalu	Velama	Velama_184_R2	Reich	Velama	19%
D254	Metspalu	Tharu	Tharu_107_R1	Reich	Tharu	99%
D260	Metspalu	Tharu	Tharu_108_R1	Reich	Tharu	98%
evo_32	Metspalu	Kanjar	321e	Metspalu	Kol	53%
HA030	Metspalu	Dharkar	HA039	Metspalu	Dharkar	52%
A387	Metspalu	Dusadh	A388	Metspalu	Dusadh	52%
A394	Metspalu	Dusadh	A395	Metspalu	Dusadh	52%
A395	Metspalu	Dusadh	A393	Metspalu	Dusadh	46%
A394	Metspalu	Dusadh	A393	Metspalu	Dusadh	45%
A392	Metspalu	Dusadh	A393	Metspalu	Dusadh	32%
A392	Metspalu	Dusadh	A395	Metspalu	Dusadh	31%
A392	Metspalu	Dusadh	A394	Metspalu	Dusadh	28%
evo_37	Metspalu	Kanjar	HA023	Metspalu	Dharkar	27%
HA039	Metspalu	Dharkar	HA041	Metspalu	Dharkar	24%
HLKP245	Metspalu	Hakkipikki	Hallaki_137_R2	Reich	Hallaki	22%
PULD160	Metspalu	Pulliyar	PULD162	Metspalu	Pulliyar	20%

As you can see, three samples from Reich et al seem to be the same as Metspalu et al. In addition, two Reich samples seem to be related to Metspalu samples.

There are some Metspalu samples who are likely related to one another. A 50% indicates likely a parent-child or sibling-sibling relationship. A 45-46% relatedness is most likely siblings in my opinion. An 18-19% percentage could be a 1st cousin relationship in an endogamous community. it could also just be the background relatedness in a small, bottlenecked and endogamous community.

It looks like about half of the Dusadh in the Metspalu dataset are related.

I am surprised at the close relationship of a Kanjar and a Kol in the dataset, though both are from Uttar Pradesh.

Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Posted by Zack on December 8, 2011 21 comments

Metspalu et al have a new paper in American Journal of Human Genetics about South Asian genetics. Here's the abstract:

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

I'll have some comments later today.

Admixture (Ref3 K=11) HRP0191-HRP0200

Posted by Zack on December 1, 2011 7 comments

Here are the admixture results using Reference 3 for Harappa participants HRP0191 to HRP0200.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

HRP0193 is Georgian and has very similar results to HRP0138 and HRP0175.

HRP0200 is Kazakh and is closely related to HRP0089. Thus the difference there (American, Onge & Papuan components) is somewhat interesting, though not high enough to be certain that it's not noise.

HRP0197 and HRP0198 are Somali. HRP0197 pointed out to me that 14S_R1, a Somali in the reference set, was an outlier who was more like East African Bantu (e.g., Luhya) than the other reference Somalis. So in the table below, I have excluded 14S_R1 for the average.

Component	RefAverage	HRP00197	HRP00198
S Asian	0	2	2
Onge	4	0	1
E Asian	0	1	2
SW Asian	28	33	34
European	0	0	0
Siberian	0	2	1
W African	12	14	13
Papuan	0	0	0
American	0	0	1
San/Pygmy	2	3	2
E African	52	44	43

Interestingly, the two project participants are more Asian than the reference average.

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: December 2011

South Asian PCA + Mclust

Reference 3 + Yunusbayev + HAP PCA and Mclust

Yunusbayev Ref3 Admixture Results

Another 23andme Sale

Metspalu Ref3 Admixture Results

Metspalu et al Data Relatedness

Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Admixture (Ref3 K=11) HRP0191-HRP0200

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Monthly Archives: December 2011

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll