HapMap

Posted by Zack on January 24, 2011

I am using several datasets in the public domain for my reference population samples. HapMap is one of those datasets.

According to its website,

The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation. The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease, and responses to drugs and environmental factors. The information produced by the Project will be made freely available.

In the first phase, it genotyped

30 Yoruba adult-and-both-parents trios from Ibadan, Nigeria, 30 trios of U.S. (Utah) residents of northern and western European ancestry, 44 unrelated individuals from Tokyo, Japan and 45 unrelated Han Chinese individuals from Beijing, China.

In their HapMap phase 3 release #3 (NCBI build 36, dbSNP b126), there are 1,397 samples with about 1,457,897 SNPs each.

I removed related individuals as well as individuals whose genomes were too similar. This left me with a total of 1,149 samples with about 474,606 SNPs in common with 23andme's version 2 data.

Since we are not interested in Native American ancestry, I also removed 58 Mexican samples, thus leaving me with 1,091 samples.

Here are the samples I am using from the HapMap data:

Ethnicity	Region	Count
African Americans	Africa	48
European Americans (Utahns)	Europe	111
Han Chinese	East Asia	137
US Chinese	East Asia	106
Gujaratis	South Asia	98
Japanese	East Asia	113
Kenyan Luhya	East Africa	101
Maasai	East Africa	135
Tuscans	Europe	102
Yoruba	West Africa	140

The region assignments are mine to aid me in the analysis, by including/excluding samples by region or by aggregating results by region to find patterns etc.

It was easiest to use the HapMap data since it's available for download in Plink format.

Datasetdata, genome, gujarati, hapmap

← Participants So Far

23andme v3 Data →

7 Comments.

Xing et al Data | Harappa Ancestry Project - pingback on January 28, 2011 at 6:58 pm
Admixture: Reference Population | Harappa Ancestry Project - pingback on January 29, 2011 at 8:44 am
Harappa Ancestry Project Update | Procrastination - pingback on February 2, 2011 at 8:41 am
HapMap Gujaratis | Harappa Ancestry Project - pingback on February 7, 2011 at 12:19 pm
Ancestry Painting | Procrastination - pingback on February 7, 2011 at 2:12 pm
McDonald Ancestry Analysis II | Procrastination - pingback on February 25, 2011 at 9:44 am
One PED File to Rule Them All | Harappa Ancestry Project - pingback on March 13, 2011 at 1:02 am

Trackbacks and Pingbacks:

Xing et al Data | Harappa Ancestry Project - Pingback on 2011/01/28/ 18:58
Admixture: Reference Population | Harappa Ancestry Project - Pingback on 2011/01/29/ 08:44
Harappa Ancestry Project Update | Procrastination - Pingback on 2011/02/02/ 08:41
HapMap Gujaratis | Harappa Ancestry Project - Pingback on 2011/02/07/ 12:19
Ancestry Painting | Procrastination - Pingback on 2011/02/07/ 14:12
McDonald Ancestry Analysis II | Procrastination - Pingback on 2011/02/25/ 09:44
One PED File to Rule Them All | Harappa Ancestry Project - Pingback on 2011/03/13/ 01:02

Harappa Ancestry Project

Genetics and South Asia

HapMap

Related

7 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

HapMap

Share this:

Related

7 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll