Behar et al Data

Posted by Zack on January 28, 2011

In their paper "The genome-wide structure of the Jewish people", Behar et al analyzed the genomes of some Jewish groups. More important than the Jewish samples (which include two South Asian Jewish groups) for us are the different South Asian, Middle Eastern, and European groups they sampled:

Ethnic group	Count
Saudis	20
Jordanians	20
Georgians	20
Turks	19
Iranians	19
Hungarians	19
Ethiopians	19
Armenians	19
Lezgins	18
Chuvashs	17
Syrians	16
Romanians	16
Uzbeks	15
Spaniards	12
Egyptians	12
Cypriots	12
Moroccans	10
Lithuanians	10
North Kannadi	9
Belorussian	9
Yemenese	8
Lebanese	7
Sakilli	4
Paniya	4
Cochin Jews	4
Bene Israel	4
Samaritians	2
Russian	2
Malayan	2

Of the 466 samples, I excluded 8 because they were either duplicates or too similar in their genomes to others.

The series matrix files that I downloaded were in a somewhat different format. To convert them to Plink format, I had to look up the platform file for the Illumina genotyping BeadChip they used. Also, Illumina used an A/B alleles and Top/Bot strands system instead of the regular ACGT alleles and forward/reverse strands. This Illumina Technote explained it and I found a Perl script to convert between the two.

Datasetbehar, bnei menashe, cochin, egypt, iran, jewish, kannadi, lebanon, lezgin, malayan, paniya, sakilli, uzbek

← SGVP

Xing et al Data →

16 Comments.

RK January 28, 2011 at 12:53 pm

I really appreciate this transparency about the datasets you're using; it lets us lowly commenters play along at home. Quick question: When you're pruning for linkage disequilibrium, what R^2 threshold are you using? It would be neat to see your summary statistics or your plink arguments.

This probably says more about how neurotic I am than anything else, but Behar, et al.'s labeling their South Indian sample "North_Kannadi" always annoyed me. It's one of those fake eastern adjectivalizations, like jihadi. It would've been better to use Kannadiga, or even Canarese.
- Zack January 28, 2011 at 3:28 pm
  
  Right now I am using an R^2 of 0.3 for LD pruning. But I plan to try some other values as well to see the effect on admixture analysis.
  
  I am trying to be transparent and likely boring the heck out of most people. But any questions about my code, methods or data are welcome.
  - razib January 28, 2011 at 4:02 pm
    
    well, the minority not bored are prolly going to be useful later if you want ppl to double check, etc.
razib January 28, 2011 at 3:25 pm

It wouldâ€™ve been better to use Kannadiga, or even Canarese.

thanks! i had no idea what your kind were called 🙂
Admixture: Reference Population | Harappa Ancestry Project - pingback on January 30, 2011 at 1:56 am
Iranians | Harappa Ancestry Project - pingback on March 24, 2011 at 12:13 pm
Africa in 12 ADMIXTURE chunks | Gene Expression | Discover Magazine - pingback on April 6, 2011 at 5:21 pm
Africa in 12 ADMIXTURE chunks | Biology News by Biologged - pingback on April 7, 2011 at 1:31 am
Behar Paniya | Harappa Ancestry Project - pingback on April 16, 2011 at 8:14 pm
Behar Bene Israel | Harappa Ancestry Project - pingback on April 20, 2011 at 7:28 pm
The evolutionary effect of the sky gods | Gene Expression | Discover Magazine - pingback on April 24, 2011 at 6:02 pm
The evolutionary effect of the sky gods | Biology News by Biologged - pingback on April 24, 2011 at 7:32 pm
ADMIXTURE, African Ancestry Project, and confirmation bias | Gene Expression | Discover Magazine - pingback on May 2, 2011 at 3:29 pm
Sonia December 4, 2011 at 9:39 am

Hi Zack
I'm having hard time converting Behar's dataset to plink formats. Can u please help by detailing how u did it. Thanks!
- Zack December 8, 2011 at 8:36 pm
  
  I described the process earlier.
  
  If you want my hacked together script, send me an email and I'll send it to you.
  - manglalayag July 20, 2013 at 12:13 pm
    
    First and foremost, you have the least cryptic genome website (especially works well with newbies like myself). I really appreciate the effort.
    
    Do you mind sending the same script?

Trackbacks and Pingbacks:

Admixture: Reference Population | Harappa Ancestry Project - Pingback on 2011/01/30/ 01:56
Iranians | Harappa Ancestry Project - Pingback on 2011/03/24/ 12:13
Africa in 12 ADMIXTURE chunks | Gene Expression | Discover Magazine - Pingback on 2011/04/06/ 17:21
Africa in 12 ADMIXTURE chunks | Biology News by Biologged - Pingback on 2011/04/07/ 01:31
Behar Paniya | Harappa Ancestry Project - Pingback on 2011/04/16/ 20:14
Behar Bene Israel | Harappa Ancestry Project - Pingback on 2011/04/20/ 19:28
The evolutionary effect of the sky gods | Gene Expression | Discover Magazine - Pingback on 2011/04/24/ 18:02
The evolutionary effect of the sky gods | Biology News by Biologged - Pingback on 2011/04/24/ 19:32
ADMIXTURE, African Ancestry Project, and confirmation bias | Gene Expression | Discover Magazine - Pingback on 2011/05/02/ 15:29

Harappa Ancestry Project

Genetics and South Asia

Behar et al Data

Related

16 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Behar et al Data

Share this:

Related

16 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll