Pan-Asian Dataset Duplicates and Relatives

As part of my effort to create one big reference dataset for my use, I have been going over all the datasets I have and make sure there's no duplicates or relatives or any other strange things that could cause issues with my analysis.

Looking at the Pan-Asian dataset, I found 3 pairs of duplicate samples and 82 pairs that could be closely related. I have removed 64 samples from the dataset.

You can see the IBD results from plink as well as the list of sample IDs I removed in a spreadsheet.

UPDATE: I found 4 Melanesians in the Pan-Asian dataset who were the same as those in HGDP. So I have removed those as well and added them in the list in the spreadsheet.

Related Reading:

Duplicate Keys
Duplicate Death
The New York Times Guide to Essential Knowledge: A Desk Reference for the Curious Mind
The Steamy Kitchen Cookbook: 101 Asian Recipes Simple Enough for Tonight's Dinner
Legends of the middle ages, narrated with special reference to literature and art

Related Posts:

  1. Call for 23andMe samples! | Gene Expression | Discover Magazine - pingback on April 10, 2011 at 1:04 am
  2. Call for 23andMe samples! | Biology News by Biologged - pingback on April 10, 2011 at 7:32 am

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Trackbacks and Pingbacks: