Behar Bene Israel

As Razib and I were discussing, the four Bnei Menashe Jewish samples from Behar et al didn't look right since Bnei Menashe are from Mizoram in the northeast of India and thus should be expected to have some East Asian admixture.

When I tried to confirm the admixture/PCA results for Bnei Menashe in the Behar et al paper, I didn't find any mention of the group. Instead, the South Asian Jewish group they mentioned was Bene Israel. According to their admixture and PCA results, Bene Israel looked more like Pakistani populations than their Indian host populations. This is consistent with what my admixture runs show.

So I suspected that the four Bene Israel samples mentioned in the Behar et al paper were accidently labeled as Bnei Menashe in the dataset. I sent an email to the authors and they have confirmed that this was the case.

I have corrected all my spreadsheets so you should see Bene Israel instead of Bnei Menashe now. If you spot Bnei Menashe anywhere, please let me know.

PS. Also, it has been confirmed that three Paniya samples were mislabeled when the data was submitted to the GEO database. They are working on fixing it soon.

UPDATE: Mait Metspalu tells me that the database has been updated with the fixed version of the Behar et al dataset.

Behar et al Data

In their paper "The genome-wide structure of the Jewish people", Behar et al analyzed the genomes of some Jewish groups. More important than the Jewish samples (which include two South Asian Jewish groups) for us are the different South Asian, Middle Eastern, and European groups they sampled:

Ethnic group Count
Saudis 20
Jordanians 20
Georgians 20
Turks 19
Iranians 19
Hungarians 19
Ethiopians 19
Armenians 19
Lezgins 18
Chuvashs 17
Syrians 16
Romanians 16
Uzbeks 15
Spaniards 12
Egyptians 12
Cypriots 12
Moroccans 10
Lithuanians 10
North Kannadi 9
Belorussian 9
Yemenese 8
Lebanese 7
Sakilli 4
Paniya 4
Cochin Jews 4
Bene Israel 4
Samaritians 2
Russian 2
Malayan 2

Of the 466 samples, I excluded 8 because they were either duplicates or too similar in their genomes to others.

The series matrix files that I downloaded were in a somewhat different format. To convert them to Plink format, I had to look up the platform file for the Illumina genotyping BeadChip they used. Also, Illumina used an A/B alleles and Top/Bot strands system instead of the regular ACGT alleles and forward/reverse strands. This Illumina Technote explained it and I found a Perl script to convert between the two.