Reference I Admixture Analysis K=16

Continuing with Reference I admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southeast Asian
C5 Southwest Asian C6 European
C7 Melanesian C8 Naxi/Yi
C9 Japanese C10 Papuan
C11 She C12 Siberian
C13 Eastern Bantu C14 Northwest African
C15 West African C16 East African

Things are breaking down now, with the East Asian components breaking up. The usefulness of higher K's is doubtful. I am going to run K=17 on this dataset and then focus on more filtered data.

Fst divergences between estimated populations for K=16:

Here are the Fst numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
C2 0.053
C3 0.064 0.060
C4 0.076 0.112 0.123
C5 0.073 0.056 0.085 0.130
C6 0.064 0.040 0.073 0.118 0.048
C7 0.164 0.200 0.215 0.165 0.217 0.206
C8 0.087 0.122 0.133 0.045 0.140 0.127 0.181
C9 0.081 0.117 0.128 0.036 0.135 0.122 0.172 0.021
C10 0.184 0.222 0.237 0.200 0.238 0.227 0.145 0.215 0.207
C11 0.083 0.119 0.130 0.023 0.137 0.125 0.171 0.025 0.017 0.209
C12 0.086 0.114 0.127 0.063 0.133 0.118 0.189 0.048 0.041 0.221 0.048
C13 0.145 0.153 0.177 0.181 0.156 0.162 0.257 0.192 0.186 0.275 0.188 0.191
C14 0.079 0.063 0.096 0.127 0.052 0.056 0.211 0.138 0.132 0.232 0.134 0.132
C15 0.153 0.162 0.186 0.189 0.166 0.172 0.265 0.201 0.195 0.283 0.197 0.200
C16 0.106 0.108 0.135 0.145 0.106 0.116 0.223 0.156 0.150 0.241 0.152 0.154
C13 C14 C15
C14 0.116
C15 0.013 0.122
C16 0.034 0.079 0.041

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.


  1. A random thought came to me on the "Pak-Caucasian" label.

    The highest % carriers are those that cluster on the Iranian plateau or periphery.

    Baluchistan and the Caucasus are arguably part of "Greater Iran" ( I believe the Baluchistan province, as is NWFP, is on the Iranian plateau.

    Please note that I'm using Iranian plateau as a *geographic* not *ethnic* term in this case.

    • The peaks are actually at the peripheries. The indus valley( makrani/balochi/brahui) and then the caucacus( within greater iran of course) .Everything in between drops to a 40-50% range.

      • Good point Simran but just a quick excerpt from wikipedia.

        Balochistan is located at the south-eastern edge of the Iranian plateau. It strategically bridges the Middle East and Southwest Asia to Central Asia and South Asia, and forms the closest oceanic frontage for the land-locked countries of Central Asia.

        Baluchistan was also the site of the Indus Valley sites but my thoughts being that with the top 15 of the Pak-Caucasian ethnicities. One is Turk, Iraqi & Uzbeki Jews the rest are all on or off the Iranian plateau.

        The 16th ethnicity is "Sindhi", which is a more conventional "Indus Valley" ethnicity (albeit apparently 40% of Sindhis descend from Baloch settlers, I wish there was a way to validate that).

        • Iranian plateau sounds like a good explanation but as Simranjit pointed out this component is higher on the periphery than in the center. Also, while it is lower in the Sindhis (39%) than Iranians, Caucasians and the different Balochistan populations, that's about the same as in Pathans. And Pathans would seem to fall under your Greater Iran label.

          • The Pathans fall on the same cline as the Sindhis as per the Reich paper. The Brahui and Baloch on the other hand were off cline probably due to their 'Iranian' component. Reich: "We found that 6 Pakistani groups (the Hazara, Kalash, Burusho, Makrani, Balochi and Brahui) were difficult to model as part off the Indian cline ... We identified 2 Pakistani groups (Pathan and Sindhi) as fully consistent with the Indian
            cline within the limits of our resolution." The periphery of Iran is probably more 'Iranian' that Iran itself.

            I suppose once the Pak/Cauc component is further resolved we may see a clearer pattern.

          • I'm leaving a short comment here. My only issue with the Pak-Caucasian label (I've queried on this before) is that it doesn't give a good geographic sense; also links South Asia and Caucasus when its clear the epi-centre most likely is somewhere hovering around the Iranian Plateau.

            Anyway its just something I'm pondering over may elaborate a bit more at my personal blog because I find it fairly interesting.

          • Uh I wrote a comment in the morning right as the site went down. I was saying that you are correct that Pakistani/Caucasian is not a good label due to the lower percentage among Sindhis, Punjabis and Pathans. A better one is Balochistan/Caucasus.

  2. Zack, would it be possible for you to calculate pairwise IBS and ASD values between the samples, instead of FST?

    Or is FST bether?

    • What's ASD?

      The FST values I posted are calculated by Admixture for the ancestral populations it computes.

      Pairwise IBS distance values would be between individual samples, right? That would be a lot of data.

  3. Well Polako said something like, FST is not very reliable, and that the error rate is to big.

    He said ASD(ALLELE SHARING DISTANCE) and IBS is much better.

    Maybe you could clarify, isnt FST the best method to calculate distances?

    • Fst has its issues, but I think it serves its purpose in this case. A better distance measure is the least of our worries while running admixture.

  4. Isopleths | Harappa Ancestry Project - pingback on March 17, 2011 at 9:37 am
  5. Reference I Admixture Analysis K=17 | Harappa Ancestry Project - pingback on March 22, 2011 at 4:13 pm
  6. Balochistan/Caucasian | Harappa Ancestry Project - pingback on April 13, 2011 at 1:30 pm

Trackbacks and Pingbacks: