Pan-Asian to PED Conversion

Even though the Pan-Asian dataset is not public, there was a request for my script to convert the data to Plink's PED format.

Here is how I convert the Pan-Asian data to Plink's transposed file format.

#!/usr/bin/perl -w
 
$file="Genotypes_All.txt";
 
open(INFILE,"<",$file);
open(TFAM,">","panasian.tfam");
open(TPED,">","panasian.tped");
 
$line = <INFILE>;
chomp $line;
@first = split('\t',$line);
foreach my $sample (5..$#first) {
        print TFAM "0 $first[$sample] 0 0 0 -9\n";
}
 
my $alleles;
 
while(<INFILE>) {
        chomp;
        @lines = split('\t',$_);
        my ($major,$minor) = split('/',$lines[4]);
        print TPED "$lines[2] $lines[1] 0 $lines[3]";
        foreach my $snp (5..$#lines) {
                if ($lines[$snp] == 0) {
                        $alleles = "$major $major";}
                elsif ($lines[$snp] == 1) {
                        $alleles = "$major $minor";}
                elsif ($lines[$snp] == 2) {
                        $alleles = "$minor $minor";}
                else {
                        $alleles = "0 0";}
                print TPED " $alleles";
        }
        print TPED "\n";
}
 
close(INFILE);
close(TFAM);
close(TPED);

Again, no guarantees! It's Perl though, so it should be more stable across various operating systems.

8 Comments.

  1. hats off to you!

  2. you are the man. keep it up.

  3. Hey Zack, do you know of a way to output a list of samples in a particular order when using the --keep flag in PLINK?

  4. How about the .map file?