Tag Archives: ancestry - Page 5

My Genetic Journey

It all started DNA Day 2010 when Razib tweeted about a $99 sale for the DNA test at 23andme. I ordered one immediately. Over the next few months, a lot of my free time was spent poring over and analyzing my genomic results.

While the health and physical traits information was interesting, I found the ancestry information that can be deduced from your genome to be fascinating. That might be because I was working on collecting together and digitizing our family tree at the time.

So to beat Razib's record of of writing about his personal genome, I have started blogging about mine:

There's much more to come, including: What's wrong with my chromosome 9 and who did I get it from?; my results from Doug McDonald, Dodecad and Eurogenes; why do I have low similarity scores with everyone?; where exactly was my great-grandmother from?; and more.

PS. Since it's Valentine's Day, I should probably mention that my top match among the people (excluding my sibling of course) I am sharing genomes with on 23andme is my dear wife, Amber.

Admixture K=4,7,9, HRP0011 to HRP0020

We'll go to higher values of K (number of ancestral populations) for batch 1 later, but let's not keep the other batches waiting.

Here's the spreadsheet with their admixture results. And you can check their ethnic backgrounds.

You might also want to refer to the reference dataset I admixture analyses for K=2-5 and K=6-9.

I did not run admixture for all values of K this time. So let's start with K=4. For quick reference,

C1 South Asian
C2 European
C3 East Asian
C4 African

Batch 2 Admixture K=4

Now, for K=7, the ancestral components are:

C1 South Asian
C2 European
C3 Southeast Asian
C4 Southwest Asian
C5 Papuan
C6 Northeast Asian
C7 African

Batch 2 Admixture K=7

And finally, here's K=9.

C1 South Asian
C2 Kalash
C3 Southwest Asian
C4 Southeast Asian
C5 European
C6 Papuan
C7 Northeast Asian
C8 West African
C9 East African

Batch 2 Admixture K=9

What do you guys think?

Higher values of K will be coming when admixture is done taking it sweet time to run. But more analysis and results are coming fast and furious now.

Admixture K=6-9, HRP0001 to HRP0010

Let's continue our admixture analysis of the first 10 Harappa Project participants.

Here are their ethnic backgrounds and their admixture analysis results.

You might want to refer to the admixture analysis of the reference dataset.

Let's look at K=6 ancestral components. As seen in the reference admixture results, we got a Papuan ancestral component (C5/blue).

Batch 1 Admixture K=6

You can see the increase in C1/red South Asian component in all the participants. The Papuan component (C5/blue) is present is all except our Assyrian sample. It is lower among the Punjabis though.

The East Asian (C3/green) is about the same as in K=5 analysis. C6/magenta, the African component, is only present in HRP0001 (me) at the same proportion as K=5. The Southwest/West Asian component (C4/cyan) is the same as C4 in K=5 with no changes.

The European component (C2/yellow) reduced in magnitude among the South Asian participants by about 14-19%. My guess about that is that the South Asian component became more "pure" for K=6 due to the separate Papuan component which was merged in the South Asian one in K=5. So it better represents the South Asians now compared to K=5, thus reducing the European proportion.

Batch 1 Admixture K=7

For K=7, C1 is South Asian, C2 European, C4 Southwest/West Asian, C5 Papuan and C7 African. These are all same as before.

The East Asian component has split into two: C3 Southeast Asian and C6 Northeast Asian. For this batch of Harappa participants, most of their East Asian ancestry falls into the Southeast Asian component.

Batch 1 Admixture K=8

For K=8, C1 is South Asian, C4 is Southeast Asian, C5 is Papuan, C6 is Northeast Asian and these have stayed about the same.

C2 (Southwest/West Asian) component has increased for most Harappa members, especially for HRP0010 (Assyrian Iranian). This change in West Asian component is balanced a bit by a decrease in C3 (European) component but the main reason for the West Asian change is that East African component has split from the Southwest/West Asian and the African components.

The African component has split into C7 West African and C8 East African. As usual, HRP0001 (me) is the only one with any West or East African component, though I have more of East African than West which makes sense due to my (part-)Egyptian ancestry.

Batch 1 Admixture K=9

For K=9, C1 is the South Asian component and it decreased in all project members except for South Indians and Bengalis. It even decreased in the Bihari sample (HRP0003) and almost disappeared from the Assyrian Iranian one (HRP0010).

The reason is the appearance of what I am calling the Kalash ancestral component (C2). This component is at 94% among the Kalash reference populaton, followed by 41% among Lezgin (a Caucasian group). It is also high among the Pakistani reference populations and other Caucasian populations. Among our first batch of Harappa participants, this Kalash component is high (27-31%) among the Punjabis and Assyrian Iranian.

C3 is the Southwest/West Asian component which hasn't changed a lot among the project members. The Southeast Asian component (C4) has decreased, as has C5 (European).

The Papuan component (C6) has remained small.

C7 (Northeast Asian), C8 (West African), and C9 (East African) have stayed the same.

I am running admixture for even higher values of K, but it takes a long time. While those are running, I am going to go ahead and start the 2nd batch (HRP0011 to HRP0020). For those, I am not going to run all K values. Instead I'll do only a few. If you have any suggestions on which specific K values I should focus on for the latter batches, please let me know.

PS. I have added the names of components to the spreadsheet for ease of use, but these should be thought of as useful mnemonics rather than these components representing some "pure" ancient population. Also remember that the South Asian (or other) component from one K value to the next might not be the same.

Admixture K=2-5, HRP0001 to HRP0010

Finally, it's time to analyze the genomes of project participants. Admixture analysis is going to be done in batches of ten so that the ancestral components are stable from one run to another.

My choice of calling them "ancestral components" is deliberate. Please do not think of them as pure ancestral populations.

First, the ethnic background of the participants in this batch. I'll give the ethnicity only if I have explicit permission from the participant to make such information public. By default, I assume it to be private. Here's the summary:

Ethnicity Count
Punjab 5
Bengal 1
Bihar 1
Tamil 1
Andhra Pradesh 1
Iran 1

Since this is the first batch, I am running admixture for all values of K to get a better handle on how things shake out. With later batches, I will run only a few specific values of K since admixture takes a long time to run.

The ancestral component percentages for project participants can be found in this spreadsheet.

It might be good to refer to the admixture runs for the reference (spreadsheet) to get a better idea of what the different ancestral components represent.

Let's start with K=2 ancestral components.

Batch 1 Admixture K=2

Cyan/African (C2) component varies from 29-51% among participants which is about what you would expect from the results for South Asian reference populations.

With K=3 where the ancestral components roughly represent European (C1/red), East Asian (C2/green) and African (C3/blue), we see the following:

Batch 1 Admixture K=3

I am HRP0001 and my number for K=3 are 77% European, 18% Asian and 5% African. This contrasts with my 23andme ancestry painting of 91.22% European, 8.69% Asian and 0.09% African. However, HRP0002 has closer numbers:

HRP0002 European Asian African
HAP 55% 43% 1%
23andme 57% 43% 0%

We (HAP) are using a much more diverse reference population while 23andme ancestry painting is based on the basic three populations of HapMap. Also, since I am a quarter Egyptian, the likelihood of some African ancestry is high in my case.

Note that the Asian (C2) percentages vary from 18% to 44% for the South Asians in this batch, but it's low (18-22%) in Punjabis and higher in southern and eastern South Asians. It's almost negligible in our Iranian Assyrian sample.

With K=4, we finally get our South Asian ancestral component (C1/red).

Batch 1 Admixture K=4

I (HRP0001) am the only one with any noticeable African component (C4/violet) while HRP0002 has some East Asian ancestry (C3/cyan). The two South Indians have lower European component (C2/green) along with HRP0002 who is from East Bengal.

Finally, let's take a look at K=5 ancestral components.

Batch 1 Admixture K=5

The South Asian (C1/red), East Asian (C3/green) and African (C5/magenta) components are about the same as in K=4. The new component here is C4/blue, which is the Southwest/West Asian component. This is basically a split from the K=4 European (C2/yellow) component. Our Assyrian sample has the highest Southwest/West Asian component while I also have it higher than the South Asians due to my quarter Egyptian ancestry.

Let's continue higher values of K next time.

Participants So Far

While I am analyzing the data, checking for errors and making sure the results I am getting are valid, here is some information about participants till now.

So far I have got 11 participants send me their raw data. Of these eleven, ten have some South Asian ancestry.

The regions/ethnicities they cover are:

  • Punjab
  • Bengal
  • Bihar
  • Tamil Nadu
  • Telegu
  • Anglo-Indian

Of these, Punjabis are the only ones I have multiple samples of. So I definitely need more samples of the other ethnicities. And there are lots of ethnicities/regions I haven't gotten any participants in.

It would be great for this project if we got a few participants from each state/province of India and Pakistan. So if you know someone who is from our target regions and has tested with 23andme, please spread the word.

If you tested with 23andme during their Christmas sale, I am hearing that results are going to start coming in starting today.