Admixture K=6-9, HRP0001 to HRP0010

Let's continue our admixture analysis of the first 10 Harappa Project participants.

Here are their ethnic backgrounds and their admixture analysis results.

You might want to refer to the admixture analysis of the reference dataset.

Let's look at K=6 ancestral components. As seen in the reference admixture results, we got a Papuan ancestral component (C5/blue).

Batch 1 Admixture K=6

You can see the increase in C1/red South Asian component in all the participants. The Papuan component (C5/blue) is present is all except our Assyrian sample. It is lower among the Punjabis though.

The East Asian (C3/green) is about the same as in K=5 analysis. C6/magenta, the African component, is only present in HRP0001 (me) at the same proportion as K=5. The Southwest/West Asian component (C4/cyan) is the same as C4 in K=5 with no changes.

The European component (C2/yellow) reduced in magnitude among the South Asian participants by about 14-19%. My guess about that is that the South Asian component became more "pure" for K=6 due to the separate Papuan component which was merged in the South Asian one in K=5. So it better represents the South Asians now compared to K=5, thus reducing the European proportion.

Batch 1 Admixture K=7

For K=7, C1 is South Asian, C2 European, C4 Southwest/West Asian, C5 Papuan and C7 African. These are all same as before.

The East Asian component has split into two: C3 Southeast Asian and C6 Northeast Asian. For this batch of Harappa participants, most of their East Asian ancestry falls into the Southeast Asian component.

Batch 1 Admixture K=8

For K=8, C1 is South Asian, C4 is Southeast Asian, C5 is Papuan, C6 is Northeast Asian and these have stayed about the same.

C2 (Southwest/West Asian) component has increased for most Harappa members, especially for HRP0010 (Assyrian Iranian). This change in West Asian component is balanced a bit by a decrease in C3 (European) component but the main reason for the West Asian change is that East African component has split from the Southwest/West Asian and the African components.

The African component has split into C7 West African and C8 East African. As usual, HRP0001 (me) is the only one with any West or East African component, though I have more of East African than West which makes sense due to my (part-)Egyptian ancestry.

Batch 1 Admixture K=9

For K=9, C1 is the South Asian component and it decreased in all project members except for South Indians and Bengalis. It even decreased in the Bihari sample (HRP0003) and almost disappeared from the Assyrian Iranian one (HRP0010).

The reason is the appearance of what I am calling the Kalash ancestral component (C2). This component is at 94% among the Kalash reference populaton, followed by 41% among Lezgin (a Caucasian group). It is also high among the Pakistani reference populations and other Caucasian populations. Among our first batch of Harappa participants, this Kalash component is high (27-31%) among the Punjabis and Assyrian Iranian.

C3 is the Southwest/West Asian component which hasn't changed a lot among the project members. The Southeast Asian component (C4) has decreased, as has C5 (European).

The Papuan component (C6) has remained small.

C7 (Northeast Asian), C8 (West African), and C9 (East African) have stayed the same.

I am running admixture for even higher values of K, but it takes a long time. While those are running, I am going to go ahead and start the 2nd batch (HRP0011 to HRP0020). For those, I am not going to run all K values. Instead I'll do only a few. If you have any suggestions on which specific K values I should focus on for the latter batches, please let me know.

PS. I have added the names of components to the spreadsheet for ease of use, but these should be thought of as useful mnemonics rather than these components representing some "pure" ancient population. Also remember that the South Asian (or other) component from one K value to the next might not be the same.


  1. That Kalash component is interesting. Is the Kalash' near "purity" in that component a function of their relative genetic isolation? In any case, I find its high presence in Caucasian populations somewhat fascinating. To me, it recalls Dienekes' "Dagestan component" though obviously I am not suggesting equation.

    When picking K values for the next batch, I would ask that you make sure to include K=9 where that fascinating Kalash component appears. Great work! I'm looking forward to your further results.

  2. That Kalash 94% in the Kalash confirms the CSA or Indus Valley component seen here:

  3. Maybe you can run some likelihood ratio tests. Or calculate the Bayes information criterion for each K to see if the added complexity is worth it.

    • I have some of that in the works and at least some values of K will be chosen based on that. I'll write about that in a few days.

      However, I was wondering if people consider any specific values of K to be more interesting.

      I cannot run admixture for all values of K for every batch as admixture takes a long time to run.

  4. very interesting. thanks

  5. Zack , could you update the ethnicity for #07 Tamil (Nadar Caste) , and #09 Andra Pradesh (Reddy Caste).

    There might be differences between tamil non brahmins and brahmins or other groups that we could pick up later on.

  6. { Brown Pundits } » Harappa, K = 9, HRP001 to HRP010 - pingback on February 9, 2011 at 12:41 am
  7. Zack,
    As a mixed-race participant with roughly 25% South Indian and 75% European ancestry I'm quite keen to look at the South Asian, Southwest/West Asian and (small) African components (which show up in Doug McDonald's and Davidski's analyses of my gnome). Somewhat curiously, a small "Amerindian" component also showed, and I'm wondering if anyone can suggest the origin of this in South Asia/India (if it's not just an artefact).

    • Tibetan population is said to have a genetic component in common with some racial groups in NE Siberia and Amerindians

    • You are in the batch I am processing right now, so some results should be posted before the weekend.

      I can't say anything about the Amerindian component since I excluded all Native American groups from my analysis, but it can be spurious if you have some South Asian or East Asian ancestry. On the other hand, did any of your European ancestors spend time in the New World?

      • Thanks to both Yosemite Sam and Zack.
        I'm almost certain that none of my European ancestors spent any time in the New World, and I don't believe I have any East Asian ancestry. I can tell from Doug McDonald's chromosome map that the "Amerindian" segment is from my father who in turn could have quite diverse South Asian ancestry. This whole project is intriguing - I await the latest results with interest.

        • If you haven't seen it, this open access article from Investigative Genetics (2011, 2:1) might be of general interest. It includes a Tibetan population.

          It provides a broad view of global ancestry and admixture by looking at 128 ancestry informative snps in 4871 people from 119 populations, including Khamba Tibetans (who seem to fall between the Yakut and Mongolians).

  8. All that java is because i was trying to share with you an interactive chart, and here it is Check it out, reference pop K9

  9. Reference Admixture Analysis K=6-9 | Harappa Ancestry Project - pingback on February 10, 2011 at 11:03 am
  10. Gene Expression » Personal genomics around the web - pingback on February 10, 2011 at 2:03 pm
  11. Admixture K=10-12, HRP0001 to HRP0010 | Harappa Ancestry Project - pingback on February 19, 2011 at 7:58 am

Trackbacks and Pingbacks: