Search Results for: afghan

Afghan Dataset

A paper, Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge by Julie Di Cristofaro, Erwan Pennarun, Stéphane Mazières, Natalie M. Myres, Alice A. Lin, Shah Aga Temori, Mait Metspalu, Ene Metspalu, Michael Witzel, Roy J. King, Peter A. Underhill, Richard Villems, Jacques Chiaroni was published at PLoS One about the genetics of the people of Afghanistan.

Thanks to Mait Metspalu, the data is available online. It consists of:

  • 5 Hazara
  • 5 Pashtun
  • 5 Tajik
  • 4 Turkmen
  • 5 Uzbek

Here are the HarappaWorld Admixture results for the samples in this dataset.

You can check the spreadsheet too.

Tadjik1_44Af and Pashtun2_6Af seem to be outliers and there's a possibility they are mislabeled. I would like to look into these two samples further before I calculate group averages.

You can compare these Pashtun results to HGDP Pathan and HAP Pashtun results.

HarappaWorld HRP0312-HRP0327

I have added the HarappaWorld Admixture results for HRP0312-HRP0327 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages.

I got a participant from the Geno 2.0 Project, HRP0326 an Afghan Pashtun. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

HarappaWorld HRP0273-HRP0283

I have added the HarappaWorld Admixture results for HRP0273-HRP0283 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

We got our first Pashtun participants, one Afghan and one Pakistani. Both have very similar results and are not much different than the HGDP Pathan sample average in their South Indian component.

HRP0278, a Bengali (mostly), is more East Asian components than any other Bengali participants (including my friend Razib.)

Participation Changes

Now that I have DIY HarappaWorld out, I am changing the participation requirements a little bit with somewhat different requirements for South Asians compared to other regions.

If you have any real ancestry from a South Asian origin, you are eligible to participate. Partial South Asian ancestry is okay. The list of countries of origin I count as South Asian are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • India
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

If you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

  • Burma
  • Tibet
  • Uyghur from Xinjiang, China
  • Tajikistan
  • Kyrgyzstan
  • Kazakhstan
  • Uzbekistan
  • Turkmenistan
  • Iran
  • Turkey
  • Azerbaijan
  • Armenia
  • Georgia
  • North Caucasian Federal District, Russia
  • Iraq
  • Syria
  • Lebanon
  • Jordan

Relatives will only be accepted when they are a better replacement for current participants. For example, replacing a participant by his/her parents or his maternal uncle and paternal aunt gets us two unrelated participants (assuming, of course, that the two sides of the family are not related by blood). Another example could be if a participant is of partial South Asian ancestry and they get replaced by a relative who has more South Asian ancestry.

Everyone else can use DIY HarappaWorld. It's fairly easy to use on both Windows and Linux. The only hard part right now is that you have to install R to standardize your genome file. I might look into creating an executable for that to make it easier.

Finally, please be honest.

Admixture (Ref3 K=11) HRP0211-HRP0220

Here are the admixture results using Reference 3 for Harappa participants HRP0211 to HRP0220.

You can see the participant results in a spreadsheet as well as their ethnic breakdowns and the reference population results.

Here's our bar chart and table. Remember you can click on the legend or the table headers to sort.

If the above interactive charts are not working, here's a static bar graph.

Do note that small percentages for your results can be noise.

HRP0211 seems like a typical Tamil Brahmin.

HRP0212 is half-Fijian, half Indian/Pakistani/Afghan. It looks like his Fijian ancestry shows up as Papuan and East Asian mostly.

HRP0213 is a Gujarati Khoja whose results are not just different from the Gujarati Patels (Gujarati A) but also from HRP0130, a Gujarati Ganchi and HapMap Gujarati B.

HRP0216 is an Iraqi Assyrian and is a little more European than the other Assyrians. The Onge, Papuan and American are likely noise.

HRP0217 and HRP0218 are Kazakhs and fairly similar to the other Kazakhs in the project.

This will probably be the last admixture analysis using Reference 3.

Accepting FTDNA Family Finder

In addition to 23andme data, I am now accepting the autosomal data from FTDNA Family Finder too.

This is due to the recent switch to Illumina Omni chip by FamilyTreeDNA which has a lot more markers in common with the 23andme data.

Since FTDNA is retesting all its current customers on the new chip, even if you tested with them earlier, you should have autosomal data from the new chip which you can download and email to me at harappa@zackvision.com.

I am basically looking for participants who have at least some ancestry from the following countries/regions:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

But if you have ancestry from West or Central Asia or Caucasus, I am likely to accept your data too.

Details of participation are here.

References

Datasets

  1. Behar, Doron M., Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, Saharon Rosset, Juri Parik, Siiri Rootsi, et al. "The genome-wide structure of the Jewish people." Nature 466, no. 7303 (July 8, 2010): 238-242. Paper & Data.
  2. Botigué, Laura R., Brenna M. Henn, Simon Gravel, Brian K. Maples, Christopher R. Gignoux, Erik Corona, Gil Atzmon, et al. "Gene Flow from North Africa Contributes to Differential Human Genetic Diversity in Southern Europe." Proceedings of the National Academy of Sciences (June 3, 2013). doi:10.1073/pnas.1306223110. Paper & Data.
  3. Bryc, K., C. Velez, T. Karafet, A. Moreno-Estrada, A. Reynolds, A. Auton, M. Hammer, C. D. Bustamante, and H. Ostrer. "Colloquium Paper: Genome-wide patterns of population structure and admixture among Hispanic/Latino populations." Proceedings of the National Academy of Sciences 107, no. 2 (2010): 8954-8961. Paper & Data.
  4. Chaubey, Gyaneshwer, Mait Metspalu, Ying Choi, Reedik Mägi, Irene Gallego Romero, Pedro Soares, Mannis van Oven, et al. "Population Genetic Structure in Indian Austroasiatic Speakers: The Role of Landscape Barriers and Sex-Specific Admixture." Molecular Biology and Evolution 28, no. 2 (February 1, 2011): 1013 -1024. Paper.
  5. Consortium, The 1000 Genomes Project. "A Map of Human Genome Variation from Population-scale Sequencing." Nature 467, no. 7319 (October 27, 2010): 1061-1073. Paper & Data.
  6. Di Cristofaro J, Pennarun E, Mazières S, Myres NM, Lin AA, et al. (2013) Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge. PLoS ONE 8(10): e76748. doi:10.1371/journal.pone.0076748. Paper & Data
  7. Haber, Marc, Dominique Gauguier, Sonia Youhanna, Nick Patterson, Priya Moorjani, Laura R. Botigué, Daniel E. Platt, et al. "Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture." PLoS Genet 9, no. 2 (February 28, 2013): e1003316. doi:10.1371/journal.pgen.1003316. Paper & Data.
  8. Henn, Brenna M., Laura R. Botigué, Simon Gravel, Wei Wang, Abra Brisbin, Jake K. Byrnes, Karima Fadhlaoui-Zid, et al. "Genomic Ancestry of North Africans Supports Back-to-Africa Migrations." PLoS Genet 8, no. 1 (January 12, 2012): e1002397. Paper & Data.
  9. Henn, Brenna M., Christopher R. Gignoux, Matthew Jobin, Julie M. Granka, J. M. Macpherson, Jeffrey M. Kidd, Laura Rodri­guez-Botigué, et al. "Hunter-gatherer genomic diversity suggests a southern African origin for modern humans." Proceedings of the National Academy of Sciences 108, no. 13 (March 29, 2011): 5154 -5162. Paper & Data.
  10. Hodoğlugil, Uğur, and Robert W Mahley. "Turkish Population Structure and Genetic Ancestry Reveal Relatedness Among Eurasian Populations." Annals of Human Genetics 76, no. 2 (March 1, 2012): 128-141. Paper.
  11. Li, Jun Z., Devin M. Absher, Hua Tang, Audrey M. Southwick, Amanda M. Casto, Sohini Ramachandran, Howard M. Cann, et al. "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation." Science 319, no. 5866 (February 22, 2008): 1100 -1104. Paper & Data.
  12. Metspalu, Mait, Irene Gallego Romero, Bayazit Yunusbayev, Gyaneshwer Chaubey, Chandana Basu Mallick, Georgi Hudjashov, Mari Nelis, et al. "Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia." The American Journal of Human Genetics 89, no. 6 (December 9, 2011): 731-744. Paper & Data.
  13. Pagani, Luca, Toomas Kivisild, Ayele Tarekegn, Rosemary Ekong, Chris Plaster, Irene Gallego Romero, Qasim Ayub, et al. "Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool." The American Journal of Human Genetics (n.d.). Paper & Data.
  14. Rasmussen, Morten, Yingrui Li, Stinus Lindgreen, Jakob Skou Pedersen, Anders Albrechtsen, Ida Moltke, Mait Metspalu, et al. "Ancient human genome sequence of an extinct Palaeo-Eskimo." Nature 463, no. 7282 (February 11, 2010): 757-762. Paper & Data.
  15. Reich, David, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh. "Reconstructing Indian population history." Nature 461, no. 7263 (2009): 489-494. Paper.
  16. Schlebusch, Carina M., Pontus Skoglund, Per Sjödin, Lucie M. Gattepaille, Dena Hernandez, Flora Jay, Sen Li, et al. "Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History." Science 338, no. 6105 (October 19, 2012): 374-379. doi:10.1126/science.1227721. Paper & Data.
  17. Simonson, Tatum S, Yingzhong Yang, Chad D Huff, Haixia Yun, Ga Qin, David J Witherspoon, Zhenzhong Bai, et al. "Genetic Evidence for High-Altitude Adaptation in Tibet." Science 329, no. 5987 (July 2, 2010): 72-75. Paper & Data.
  18. Teo, Yik-Ying, Xueling Sim, Rick T H Ong, Adrian K S Tan, Jieming Chen, Erwin Tantoso, Kerrin S Small, et al. "Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations." Genome Research 19, no. 11 (November 2009): 2154-2162. Paper & Data.
  19. The HUGO Pan-Asian SNP Consortium. "Mapping Human Genetic Diversity in Asia.” Science 326, no. 5959 (December 11, 2009): 1541 -1545. Paper.
  20. The International HapMap 3 Consortium. "Integrating common and rare genetic variation in diverse human populations." Nature 467, no. 7311 (2010): 52-58. Paper & Data.
  21. Xing, Jinchuan, W Scott Watkins, Adam Shlien, Erin Walker, Chad D Huff, David J Witherspoon, Yuhua Zhang, et al. "Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping." Genomics 96, no. 4 (October 2010): 199-210. Paper & Data.
  22. Yunusbayev, Bayazit, Mait Metspalu, Mari Järve, Ildus Kutuev, Siiri Rootsi, Ene Metspalu, Doron M. Behar, et al. "The Caucasus as an Asymmetric Semipermeable Barrier to Ancient Human Migrations." Molecular Biology and Evolution (2011). Paper.

Analysis

  1. Alexander, David H., John Novembre, and Kenneth Lange. "Fast model-based estimation of ancestry in unrelated individuals." Genome Research 19, no. 9 (2009): 1655 -1664. http://genome.cshlp.org/content/19/9/1655.abstract.
  2. Browning, Sharon R., and Brian L. Browning. "Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering." The American Journal of Human Genetics 81, no. 5 (November 1, 2007): 1084-1097. http://www.cell.com/AJHG/retrieve/pii/S0002929707638828.
  3. Delaneau, Olivier, Jonathan Marchini, and Jean-Francois Zagury. "A linear complexity phasing method for thousands of genomes." Nat Meth advance online publication (December 4, 2011). http://dx.doi.org/10.1038/nmeth.1785.
  4. Lawson, Daniel John, Garrett Hellenthal, Simon Myers, and Daniel Falush. "Inference of Population Structure using Dense Haplotype Data." PLoS Genet 8, no. 1 (January 26, 2012): e1002453. http://dx.doi.org/10.1371/journal.pgen.1002453.
  5. Manichaikul, Ani, Josyf C. Mychaleckyj, Stephen S. Rich, Kathy Daly, Michèle Sale, and Wei-Min Chen. "Robust Relationship Inference in Genome-wide Association Studies." Bioinformatics 26, no. 22 (November 15, 2010): 2867 -2873. http://bioinformatics.oxfordjournals.org/content/26/22/2867.abstract.
  6. Patterson, Nick, Alkes L Price, and David Reich. "Population Structure and Eigenanalysis." PLoS Genet 2, no. 12 (December 22, 2006): e190. http://dx.plos.org/10.1371/journal.pgen.0020190.
  7. Patterson, Nick, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich. “Ancient Admixture in Human History.” Genetics 192, no. 3 (November 1, 2012): 1065–1093. doi:10.1534/genetics.112.145037.
  8. Price, Alkes L, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. "Principal components analysis corrects for stratification in genome-wide association studies." Nat Genet 38, no. 8 (2006): 904-909. http://dx.doi.org/10.1038/ng1847.
  9. Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, David Bender, Julian Maller, et al. "PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses." American Journal of Human Genetics 81, no. 3 (September 2007): 559-575. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/

Software

  1. Admixture
  2. Eigensoft
  3. ADMIXTOOLS
  4. Plink
  5. Plink2
  6. ChromoPainter/fineSTRUCTURE
  7. BEAGLE
  8. SHAPEIT
  9. KING
  10. R
  11. MClust
  12. Google Visualization API

Participation

There are two categories of people eligible to participate in the Harappa Ancestry Project.

First, if you have any real ancestry from a South Asian origin (except for Romany/Roma/Gypsy ancestry), you are eligible to participate. Those with partial South Asian ancestry are also welcome. The list of countries of origin I count as South Asian are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • India
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka

Note that 2-3% South Asian from Dr. McDonald's BGA or Dodecad Project does not count as South Asian ancestry.

Secondly, if you have all four of your grandparents from one of the following countries or regions, you can also send me your data.

  • Burma
  • Tibet
  • Uyghur from Xinjiang, China
  • Tajikistan
  • Kyrgyzstan
  • Kazakhstan
  • Uzbekistan
  • Turkmenistan
  • Iran
  • Turkey
  • Azerbaijan
  • Armenia
  • Georgia
  • North Caucasian Federal District, Russia
  • Iraq
  • Syria
  • Lebanon
  • Jordan

Everyone else can use DIY HarappaWorld to estimate their admixture results.

Right now, I am accepting raw data samples from people who have tested with 23andme, FTDNA Family Finder, or ancestry.com DNA.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme or FTDNA to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Here's my privacy policy.

Introduction

I have become interested (some would say obsessed) with genetics recently. I wrote about getting my DNA test done and there's a lot more about my own results that I plan to bore you with.

One fun application of genetic testing is inferring ancestry: Which ancestral group are you descended from? Can we estimate the admixture of the different population groups you are descended from?

Most DNA testing companies provide information about ancestry and genetic genealogy has taken off. With several genome databases (HapMap, HGDP, etc) and software (like plink, admixture, Structure) publicly available, the days of the genome bloggers are here. And I am trying to be the latest one.

In starting this project, I have been inspired by the Dodecad Ancestry Project by Dienekes Pontikos and Eurogenes Ancestry Project by David Wesolowski. The catalyst for this project was my friend Razib who I bug whenever I need to talk genetics.

What is Harappa Ancestry Project?
It is a project to analyze (autosomal) genetic data of participants of South Asian origin for the purpose of providing detailed ancestry information. So the focus of the project is on South Asians: Indians, Pakistanis, Bangladeshis and Sri Lankans.

The project will collect 23andme raw genetic data from participants to better understand the ancestry relationships of different South Asian ethnicities.

I have named it after Harappa, an archaeological site of the Indus Valley Civilization in Punjab, Pakistan.

Participation
People of South Asian origin, or from neighboring countries, are eligible to participate. The list of countries of origin I am accepting are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Right now, I am only accepting raw data samples from people who have tested with 23andme.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Data Privacy
The raw genetic data and ancestry information that you send me will not be shared with anyone.

Your data will be used only for ancestry analysis. No analysis of physical or health/medical traits will be performed.

The individual ancestry analysis published on this blog will be done using an ID of the form HRPnnnn known to only you and me.

What do you get?
All results of ancestry analysis (individual and group) will be posted on this blog under the Harappa Ancestry Project category. This will include admixture analysis as well as clustering into population groups etc.

I suggest you read about Dienekes' analysis on South Asians for an idea about what to expect.

You can access all blog posts related to this project from the Harappa Ancestry Project link on the navigation menu on every page of my website. You can also subscribe to the project feed.