HapMap Redo

Posted by Zack on April 2, 2011

As part of my effort to create one big reference dataset for my use, I have been going over all the datasets I have and make sure there's no duplicates or relatives or any other strange things that could cause issues with my analysis.

So I went back to HapMap, which you can download from their website. I am using HapMap 3 public release #3 from May 28, 2010.

I found one set of duplicates, NA21344 is identical to NA21737. And a whole bunch of pairs with high identity-by-descent values, which I calculated using Plink. You can see the samples with PI_HAT greater than 0.5 in this spreadsheet. PI_HAT is the proportion IBD estimated by plink. Notice also that all these pairs also have high IBS similarity (the DSC column), more than 85% similar in fact.

All the 41 samples I have removed as a result of this are listed in this spreadsheet.

Datasetduplicate, hapmap, ibd, reference

← Ref 2 South Asians + Harappa MDS Clusters

Behar Redo →

Harappa Ancestry Project

Genetics and South Asia

HapMap Redo

Related

Comments are closed.

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

HapMap Redo

Share this:

Related

Comments are closed.

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll