Participation Rate

Just thought I would show you how quickly I was getting data earlier in the year and how it has slowed down now. This shows the number of days it took to get 10 samples.http://reteks.ru

As you can see, data submission has picked up a little recently.

Admixture (Ref3 K=11) HRP0171-HRP0180

Here are the admixture results using Reference 3 for Harappa participants HRP0171 to HRP0180.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

HRP0171 is our 2nd Tamil Vellalar from Sri Lanka and the results are similar to HRP0169.

HRP0172 has 1/16 Romani ancestry. The Onge component is about 0.4% which could be noise or possibly evidence of a South Asian connection via the Romany.

HRP0174 and HRP0176 are Kerala Nairs.

HRP0175 is a Georgian Svan and pretty similar to HRP0138 (who is Georgian but not sure which local ethnic group).

HRP0177 (Azeri) is a bit more northern European than HRP0083.

HRP0178, our first Punjabi Khatri, has admixture results more like the Punjabi Jatts than Punjabi Brahmins.

HRP0179, who is 7/8 Turkish and 1/8 Kurd, has the highest Siberian component (5%) other than the Kazakh participant.

HRP0180 is our first Pashtun even if he's only half-Pathan (the other half being English). I have heard grumblings on the net about the HGDP Pathans not being representative of the Pashtun tribes. If we use the HGDP Pathans and 1000genomes British averages to estimate HRP0180's recent ancestry, we get 45.5% Pashtun and 54.5% British. So it seems that the HGDP Pathan samples are reasonable for at least this individual.

Interactive Tree Generation

Anyone know of any software to generate a javascript (or something) tree/dendrogram for the web which is interactive, i.e. branches can be expanded and collapsed and one can search for different nodes.

I want to use it to generate dendrograms including all Harappa participants and individual reference samples. So we are looking at more than 4,000 nodes on the tree.

Admixture (Ref3 K=11) HRP0161-HRP0170

Here are the admixture results using Reference 3 for Harappa participants HRP0161 to HRP0170.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

HRP0161 is my mom.

HRP0169 is our first 100% Sri Lankan Tamil. Admixture results are close to the other non-Brahmin Tamils.

HRP0170 is a Haryana Jatt whose results match the other Haryana/UP Jatt.

ANI/ASI Admixture Dating

Via Razib, here's an interesting abstract from the International Congress of Human Genetics by David Reich's group:

Estimating a date of mixture of ancestral South Asian populations.

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.

I would be very interested in reading that paper. Also, I wonder how many new samples did they genotype beyond the ones in Reich et al' Reconstructing Indian Population History and if I could get my hands on the new data.

I have a feeling that ANI (Ancestral North Indian) captures a bunch of different migrations and conquests etc, so I am not sure if it can be equated to Indo-European language movement.

I wonder if I can use HAPMIX or StepPCO to get similar admixture dating.

Dataset in Public

I get requests from time to time about sharing my Reference 3 dataset. I use a few datasets which I am not allowed to redistribute, but most of the others are actually public and the main issue is to convert them to plink format and merge them.

I have released code for the conversion already but to make the task even easier I am letting you guys know that I already released a subset of my dataset a long time ago. Razib wrote about it and added the detailed instructions on using that dataset.

So here's the link to the dataset which contains about 30,000 SNPs and almost 4,000 individuals from HapMap, HGDP, SGVP, Behar et al and Xing et al.

Admixture Ref3 Dendrogram HRP0001-HRP0160

I haven't done any admixture dendrograms in a while, so I thought you guys might be interested.Особенности национального строительства. Стены помещения.

This uses admixture results using Reference 3. As usual, I used complete linkage for the hierarchical clustering.

Let's look at the dendrogram using regular Euclidean distance measure between admixture results.

I also decided to use chi squared distance measure to do the clustering.

PS. Any thoughts on the trees based on two different distance measures?

Admixture (Ref3 K=11) HRP0151-HRP0160

Here are the admixture results using Reference 3 for Harappa participants HRP0151 to HRP0160.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

There are several interesting participants here. HRP0151 is a quarter Nepalese and his/her results are actually quite odd. The East Asian ancestry shows up as Native American which is possible. I wonder if the quarter Chinese ancestry is not Han but rather some other Chinese ethnicity.

HRP0155 is Sri Lankan Sinhalese and has a lower Onge component than I expected.

HRP0158 is my Dad and has similar results as me (HRP0001).

23andme $50 Off

I got an email from 23andme for a $50 off coupon. The coupon code is YCM48E. So you can use this coupon code to reduce the price of a 23andme test from $99 to $49.

Here's the email:

Want to prove that your parents are to blame for your sleeping-in gene? Or are you simply curious if your best friend is in fact a distant relative, which may explain your mutual love for jellybeans and basset hounds? 23andMe allows you to compare your DNA with friends and family so that you can make fun and interesting discoveries together.

Get your friends and family on board with this $50 coupon. Share it with as many people as you like, but remember that this coupon expires in 7 days (August 9, 2011).

Have fun!

The 23andMe Team

To use this coupon, visit our online store and add an order to your cart. Click "I have a discount code" and enter the code below.

$50 Off

Coupon code: YCM48E

Share with your friends!

(Valid for new customers only)

Again the coupon code is YCM48E for $50 off till August 9, 2011.

Admixture: Supervised Zombies Vs Unsupervised

I wanted to see how the supervised ADMIXTURE using zombies performed compared to regular unsupervised ADMIXTURE. Zombies here refers to genomes created using the --simulate option of plink from allele frequencies.

Therefore, I used the allele frequencies computed by Admixture for K=11 ancestral components for Reference 3 to generate 25 zombie individuals per ancestral component.

Using these 275 zombie samples as belonging 100% to one ancestral component, I ran Admixture in supervised mode on the Reference 3 dataset. You can see the population average results here (compare to unsupervised results).

Since I was interested in the difference between the supervised zombie admixture and the unsupervised results, here are the histograms for the difference between the two for all 3,886 samples and each ancestral component. The histogram bins are 0.5% wide.











Most of the results are within the usual error margins. Except for C7 West African component and C10 San/Pygmy component. Those two have larger differences between the unsupervised and supervised zombies approaches. Basically, individuals with West Africans or San/Pygmy ancestry get ~5-8% more West African component in the supervised zombie case with a corresponding decrease in the San/Pygmy component.