Dense South Asian ChromoPainter

I had run ChromoPainter/fineSTRUCTURE for 715 South Asians using only about 90,000 SNPs. I thought it would be a useful exercise to use more SNPs, so I had to drop the Reich et al dataset. That left me with 615 individuals and 418,854 SNPs.

The "chunkcounts" file has the donors in columns and recipients in rows. Here's a heat map of the same.

fineSTRUCTURE classified these 615 individuals into 89 clusters. I have named these clusters for convenience, however, the names do not imply that anyone in the Punjab cluster is Punjabi.

While I created the cluster tree at the top of the spreadsheet, here's how the clusters are related.

The most interesting thing is how Gujarati A (likely Patels) are an out-group to everyone else. Another major grouping is that of the Baloch, Brahui and Makrani, along with 4 Sindhis (might be one of the Baloch tribe of Sindh?).

The Punjabis, Sindhis and Pathan get better classification here than they did last time.

The Punjab cluster includes 3 Gujarati B, 4 Pathans, 2 Singapore Indians, Punjabis, Haryanvis, Kashmiris, and a Rajasthani Brahmin. Even using this method, HRP0036, who is half-Sri Lankan and half-German/Polish was classified in the same cluster.

The Dharkar and Kanjar could not be separated at all here. According to Metspalu:

There are three second degree relatives groups in our sample: ..snip.. [Kanjar evo_37 and Dharkar HA023]. Again the last pair needs further explanation. The Dharkar and Kanjar practice a nomadic lifestyle and were living side by side at the time of sampling. As the ethnic border between the two is permeable we cannot rule out neither our error during sample collection and/or subsequent labelling nor shifted self-identity.

The inter-cluster heat map:

And you can see the chunkcounts donated from each cluster to recipient individuals in a spreadsheet.

The pairwise coincidence:

And the PCA plots:


  1. Hi Zack. Out of curiosity - why wasn't HRP190 included in the analysis?

  2. My autosomal (HRP0212) doesn't really help since it's half Oceania and half Pakistani/Afghan/India. Which is really pushing me to get my great uncle's autosomal, since he's the only surviving child of my great grandfather who's Pashtun and migrated to the pacific in the early 1900's.

  3. re: gujarati-a. lots of genetically similarity people with a large N. probably why they form the outgroup.

  4. Hi Zack,

    I'm HRP0155 (Sinhalese Govigama) and am in the Andhra cluster, together with the 2 Sri Lankan Tamil Vellalars (HRP0169 and HRP0171). There is another Sri Lankan (HRP0122) who is in the Bengal cluster.

    It will be very interesting to see where the 100 new Sri Lankan Tamil samples from 1000genomes you mentioned will end up. In the meantime need to put the word out to get more Sri Lankans in this project!

  5. Dodecad South Asian ChromoPainter | Harappa Ancestry Project - pingback on March 4, 2012 at 12:25 pm
  6. Kanjar and other "Gypsy" DNA will be all over the place as Indian castes, especially the Nomadic castes, are only partially based on rigid endogamous racial/clan groups. Most Indian castes, especially those belonging to the Shakti religion [Mother Goddess worship] were created as sects/cults that accepted a specific holyman/Godman/baba/sadhu as their Shaman/Godman. For example, the Nath/Kalbeliya musician/dance caste were drawn from at least 9 clans. One can see the genetic diversity in their settlements- from Macedonian Green eyes to the facial structure of the hill tribes.

Trackbacks and Pingbacks: