Two Steps Forward, Two Steps Back

Posted by Zack on March 29, 2011

I got my daughter a netbook, so now my computer is doing Harappa Project work 24x7.

Also, Simranjit was nice enough to offer me the use of a server. For privacy reasons, I am not going to upload any of the participants' data there but it is much faster than my machine and hence very useful for running Admixture on the reference data (especially with crossvalidation).

As for steps back, I downloaded the current 1000genomes data (1,212 samples, 2.4 million SNPs). It's in vcf format. Using vcftools to convert it to ped format will take about 3 weeks. Yes you heard that right. BTW, the good stuff from a South Asian point of view will come later this year with a 100 Assamese Ahom, 100 Kayadtha from Calcutta, 100 Reddys from Hyderabad, 100 Maratha from Bombay and 100 Lahori Punjabis.

Also, I spent most of Sunday evening and night in the ER and got a diagnosis of ureterolithiasis for my efforts. All I can say is: Three cheers for Percocet!!

UPDATE: Dienekes was kind enough to send me his conversion code which looking at the source code should run really fast.

I am still astonished at why the vcftools conversion code is so slow. May be I should look at their source code.

Miscellaneous1000genomes, computing, personal

← Admixture K=12, HRP0061-HRP0070

Ref 2 South Asians + Harappa PCA →

10 Comments.

Parasar March 29, 2011 at 12:43 pm

Best wishes on a speedy recovery.
razib March 29, 2011 at 1:25 pm

get better man! we need you! 🙂
The limits of computational power – shades of 1982 | Gene Expression | Discover Magazine - pingback on March 29, 2011 at 3:45 pm
sv March 29, 2011 at 6:49 pm

You have my best wishes as well. I hope you recover soon.

Also, that is very cool of Dienekes to help you out like that!
The limits of computational power â€“ shades of 1982 | Biology News by Biologged - pingback on March 29, 2011 at 7:32 pm
Hrp019 March 30, 2011 at 2:31 am

Feel better soon!
1000genomes | Harappa Ancestry Project - pingback on April 10, 2011 at 9:29 am
DMXX April 10, 2011 at 10:47 am

Get well soon, Zack!
- Zack April 12, 2011 at 7:27 am
  
  Thanks.
Vasishta April 11, 2011 at 3:32 am

Didn't notice this post at all - hope you have a speedy recovery Zack :).

Trackbacks and Pingbacks:

The limits of computational power – shades of 1982 | Gene Expression | Discover Magazine - Pingback on 2011/03/29/ 15:45
The limits of computational power â€“ shades of 1982 | Biology News by Biologged - Pingback on 2011/03/29/ 19:32
1000genomes | Harappa Ancestry Project - Pingback on 2011/04/10/ 09:29

Harappa Ancestry Project

Genetics and South Asia