Tag Archives: pakistan

Burusho Kalash HarappaWorld Admixture

Someone asked for the individual HarappaWorld Admixture results for the Burusho and Kalash from HGDP.

In the chart below as well as in the spreadsheet, the IDs starting with "b" belong to the Burusho and those starting with "k" belong to the Kalash individuals.

You can check the spreadsheet too.

Participation Changes

Now that I have DIY HarappaWorld out, I am changing the participation requirements a little bit with somewhat different requirements for South Asians compared to other regions.

If you have any real ancestry from a South Asian origin, you are eligible to participate. Partial South Asian ancestry is okay. The list of countries of origin I count as South Asian are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • India
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

If you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

  • Burma
  • Tibet
  • Uyghur from Xinjiang, China
  • Tajikistan
  • Kyrgyzstan
  • Kazakhstan
  • Uzbekistan
  • Turkmenistan
  • Iran
  • Turkey
  • Azerbaijan
  • Armenia
  • Georgia
  • North Caucasian Federal District, Russia
  • Iraq
  • Syria
  • Lebanon
  • Jordan

Relatives will only be accepted when they are a better replacement for current participants. For example, replacing a participant by his/her parents or his maternal uncle and paternal aunt gets us two unrelated participants (assuming, of course, that the two sides of the family are not related by blood). Another example could be if a participant is of partial South Asian ancestry and they get replaced by a relative who has more South Asian ancestry.

Everyone else can use DIY HarappaWorld. It's fairly easy to use on both Windows and Linux. The only hard part right now is that you have to install R to standardize your genome file. I might look into creating an executable for that to make it easier.

Finally, please be honest.

Balochistan/Caucasian

There has been some discussion in the comments about the C2 ancestral component at K=12 admixture runs which I called Pakistani/Caucasian.

First of all, we should remember that these "names" of ancestral populations are just rough mnemonics. They are chosen based on the frequencies of the component among modern reference samples. So the names have nothing at all to do with history.

In the case of Pakistani/Caucasian component, I wanted to emphasize the peaks of the component in Pakistan and the Caucasus. As commenters pointed out, the component is also quite high among the Iranians.

However, I have realized that this name, Pakistani/Caucasian, is a hindrance rather than a help for understanding the Admixture results. Also, this component is lower among the Pathan, Sindhis, and Punjabis than it is for Iranians etc. Therefore, the Pakistani part of the name is a bit of a misnomer, considering that the Pakistani populations it is high among comprise only about 5% of the country's population.

On the other hand, I do not like the name "Iranian" for this component. While it was suggested based on the geographical Iranian plateau which extends from the Caucasus to Balochistan, it still is confusing and it doesn't emphasize the peak areas.

Thus, I have renamed "Pakistani/Caucasian" as "Balochistan/Caucasus". I didn't use the shorter Baloch as this component is equally high among the Baloch, Brahui and Makrani, all populations living in the province of Balochistan.

Introduction

I have become interested (some would say obsessed) with genetics recently. I wrote about getting my DNA test done and there's a lot more about my own results that I plan to bore you with.

One fun application of genetic testing is inferring ancestry: Which ancestral group are you descended from? Can we estimate the admixture of the different population groups you are descended from?

Most DNA testing companies provide information about ancestry and genetic genealogy has taken off. With several genome databases (HapMap, HGDP, etc) and software (like plink, admixture, Structure) publicly available, the days of the genome bloggers are here. And I am trying to be the latest one.

In starting this project, I have been inspired by the Dodecad Ancestry Project by Dienekes Pontikos and Eurogenes Ancestry Project by David Wesolowski. The catalyst for this project was my friend Razib who I bug whenever I need to talk genetics.

What is Harappa Ancestry Project?
It is a project to analyze (autosomal) genetic data of participants of South Asian origin for the purpose of providing detailed ancestry information. So the focus of the project is on South Asians: Indians, Pakistanis, Bangladeshis and Sri Lankans.

The project will collect 23andme raw genetic data from participants to better understand the ancestry relationships of different South Asian ethnicities.

I have named it after Harappa, an archaeological site of the Indus Valley Civilization in Punjab, Pakistan.

Participation
People of South Asian origin, or from neighboring countries, are eligible to participate. The list of countries of origin I am accepting are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Right now, I am only accepting raw data samples from people who have tested with 23andme.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Data Privacy
The raw genetic data and ancestry information that you send me will not be shared with anyone.

Your data will be used only for ancestry analysis. No analysis of physical or health/medical traits will be performed.

The individual ancestry analysis published on this blog will be done using an ID of the form HRPnnnn known to only you and me.

What do you get?
All results of ancestry analysis (individual and group) will be posted on this blog under the Harappa Ancestry Project category. This will include admixture analysis as well as clustering into population groups etc.

I suggest you read about Dienekes' analysis on South Asians for an idea about what to expect.

You can access all blog posts related to this project from the Harappa Ancestry Project link on the navigation menu on every page of my website. You can also subscribe to the project feed.