C1 South Asian C2 Balochistan/Caucasus
C3 Gujarati C4 Kalash
C5 Southeast Asian C6 European
C7 Mediterranean C8 Japanese
C9 Southwest Asian C10 Melanesian
C11 Siberian C12 Papuan
C13 Chinese C14 Eastern Bantu
C15 Northwest African C16 West African
C17 East African

The new ancestral component is the tightly clustered Gujarati. This consists of almost two-thirds of the Gujaratis sampled by HapMap in Houston, TX. So my question is does anyone have any idea which Gujarati communities are the biggest in Houston? I know that Patel is a very common name, probably the most common South Asian last name in the US. Most Patels I know have been from Gujarat. Are Patels a tightly knit community who are endogamous but likely don't marry close cousins? Are there different Patel subcommunities?

Fst divergences between estimated populations for K=17:

Here are the Fst numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
C2 0.072
C3 0.032 0.044
C4 0.076 0.061 0.062
C5 0.085 0.120 0.085 0.129
C6 0.076 0.045 0.059 0.072 0.123
C7 0.085 0.062 0.073 0.088 0.138 0.050
C8 0.084 0.119 0.084 0.128 0.035 0.122 0.138
C9 0.091 0.059 0.076 0.095 0.139 0.062 0.058 0.139
C10 0.168 0.203 0.168 0.215 0.171 0.206 0.220 0.172 0.221
C11 0.090 0.116 0.088 0.127 0.064 0.117 0.135 0.039 0.138 0.188
C12 0.188 0.225 0.189 0.237 0.209 0.228 0.242 0.207 0.243 0.145 0.220
C13 0.086 0.122 0.087 0.130 0.030 0.125 0.140 0.014 0.142 0.173 0.044 0.210
C14 0.151 0.155 0.146 0.177 0.186 0.163 0.164 0.186 0.152 0.257 0.190 0.275
C15 0.089 0.066 0.076 0.096 0.133 0.060 0.054 0.132 0.063 0.211 0.131 0.232
C16 0.160 0.164 0.155 0.186 0.194 0.173 0.173 0.195 0.162 0.265 0.199 0.283
C17 0.114 0.111 0.107 0.136 0.150 0.119 0.114 0.151 0.106 0.223 0.154 0.242
C13 C14 C15 C16
C14 0.188
C15 0.135 0.115
C16 0.197 0.013 0.122
C17 0.153 0.034 0.079 0.041

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.


  1. i think non-south indian hindus avoid cousin marriage (village exogamy is common in the north).

  2. Is it possible to give a general phenotype to Gujarati (pics), what is this component ANi ASI Ancient Indus.

  3. The Gujarati component is just 1% in the Brahui and minimal outside South Asia.

  4. Wow, this is extraordinary!

    Gujaratis, afaik, definitely comply with the clan and village exogamy system of NI Hindus (socially if you're from the same village and/or extended clan, you are brother and sister, so marriage is absolutely taboo). However, I think they are much more organized by sect and region (ie subregions within Gujarat) than any other Northern Hindus and tend to stick to marriages within those communities. Gujaratis also tend to immigrate in a very regional way, i.e. many people from the same village or subregion will immigrate to the same location. Did the Houston study identify what areas within Gujarat these folks may have come from? Patel is a titular surname and not really an ethnic one. I think Patidar ( may be the most common caste that uses Patel in Gujarat. BTW I was thinking earlier that Gujarat has had a tonne of other kinds of immigration (down from Sindh via Kutch, the Parsis, Rajputs, Gujjars - after whom the state is named, etc). If one group has somehow maintained community exclusivity while another set of castes "let-go," it might explain why there is this divergence between the two Gujarati groups.

    • In eastern India we use a similar term Pattidar, not as a caste but as a landholder who is part of a co-owned joint-proprietorship or Pattidari. A system under which large parcels of land could be tilled without breakups.

      The part of Gujarat that was called Gurjara in the earliest period that we have a record of was between Narmada and Tapti. The Gurajara-Chahamanas of Nandod and Broach were driven northwards after the Arab invasion of the region. A line of the Gurjaras, the Pratihars, came to occupy Kannauj. The Gurjaras have some connection to the Chalukyas (cf. Mandavya gotra, Mandvyapura). They are mentioned as a Chalukya feudatory in the Badami Chalukya inscriptions & clubbed with Lata and Malwa. These same folk - the Gurjaras, Chahamanas, Rastrakutas, Gohils, Chalukyas are later called Rajputs.

  5. Hmmm ... there seem to be enough Leva Patils in Houston to organize their own association - (Leva Patidar Samaj of Houston). On the other hand, Gujaratis are among the most institutionally organized people from South Asia (seriously), so for all I know there may be a dozen others for other groupings.

  6. Sorry, I am spamming you all, but is is interesting to see just how concentrated this component is. Besides Gujaratis these groups have it at > 10%:
    bnei menashe jews
    cochin jews
    north kannadi
    sindhi (northward gujju leakage?)
    singapore indians (probably gujjus)

    Everybody else is 3% max, except *only* for paniya at 6%. North Kannadi rings a bell in that Jains came to dominate both North Karnataka and Gujarat at one point - this is probably irrelevant though. What's going on over here? Who are Sakillis BTW? Never heard of 'em.

    • Gujuratis don't form a large portion of singapore indians.The higher c3 is most likely by proxy , due to groups close to pathan and sindhi from the sikh religions( eg. jats , khatris, tarkhans) that are definately in the singapore indians pools.

      Sakkili are basically low caste tamils.

      • Here are the C3 Gujarati component statistics for some groups:

        Min. 1st Qu. Median Mean 3rd Qu. Max.
        Gujarati-B 13.97 24.43 30.46 29.19 34.35 39.56
        Singapore Indians 14.56 21.79 24.40 24.56 26.56 37.78
        Sindhi 0.00 11.59 15.24 13.94 16.77 19.57
        Pathan 3.48 10.79 13.06 13.22 14.66 22.90

  7. I'm thinking C3 is actually most likely a north or central indian component, something that probably peaks in central indians of non-brahmin or tribal descent. Maybe even a gurjar component.

    Taking a look at these quick maps i generated for the K17 reference 1.

    You will notice C3 peaks in guju a and is in low amounts in the pakistani populations, while almost non existent in the southern populations. C1 however peaks in the north kannadi and is virtually absent in guju a.

    Guju b is most likely from groups that have more pakistani/central asian influence, while guju a are gujratis proper?

    We will have to see the participants k=17 results to see if this is true.



    • My first thought about the Gujarati component is that it is a not a real ancestral component but is an artifact of the closeness of the Gujarati samples we have and the way Admixture works. That guess might be modified later but that's what I would start with.

      • Thats could be , but we must consider that even the guju hrp participants are showing extremely high southasian (which i suspect is actually due to C3). We'll know more i guess with the participants k=17 results.

        • It looks more like the Kalash. Neither of them is ancestral, but as Zack says an artifact of the way Admixture works - closeness of samples sets them apart and Admixture picks them as a non-admixed component.

          • Either case it would be interesting to see what is the distribution of C3 in the punjabis/biharis/up and rajasthani participants. Kalash are a unique case i think as a small and truly isolated group, unlike gujuratis who should be exogamous.

          • I am going to run K=17 on the participants mostly because of the Mediterranean component, but because I am curious about where the Harappa Gujaratis fall.

          • south asians have elevated levels of runs of homozygosity, even among hindus. itz cuz u people breed only within jati or whatever (i'm excluding myself from this, since i know i have mixed ancestry for all 4 of my grandparents ).

          • I am going to run K=17 on the participants

            That's terrific! Looking forward to it.

      • i think zack is correct. this strange cluster shows up in *both* PCA and ADMIXTURE.

  8. Thank you for your great and enlightening blog.
    May I know your comments about the C2 component (Baloochistan-Caucasus) it is common to both Europeans and Asians=>could we interpret that with that cluster being introduced to Europe and India by the indo-european migrations as the result of neolithic aged migration of farmers (Colin Renfrew's model)?

  9. Zack, is it possible to get access to the admixture components for the individual Gujarati_b samples? The group admixture components are listed- but since the samples are so different to one another (in ASI levels), it does not tell us anything about the background of the donors.

    With component breakdown of individual samples, it may be possible to decipher whether the donor is a High Caste individual, or if the sample belongs to a person with foreign (non-South Asian) admixture.

  10. Hi Zack
    I sent you my raw data yesterday. In my email I stated that my dad and his family were from Lahore prior to partition but moved. Anyhow, I was kind of disappointed in the results from 23andme as it does not give any sort of south Asian background. I tried using your diy but I simply could not do it. Any help would be greatly appreciated!

