Monthly Archives: June 2012

ANI-ASI Admixture Dating

Posted by Zack on June 25, 2012 169 comments

Similar to an earlier conference poster, Reich Lab's Priya Moorjani et al have another poster at SMBE. Here's the abstract:

Estimating a date of mixture of ancestral South Asian populations
Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI mixture occurred 1,200-4,000 years ago. Some mixture may also be older--beyond the time we can query using admixture linkage disequilibrium--since it is universal throughout the subcontinent: present in every group speaking Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of additional mixture.

I bolded the portion which seems new compared to the previous abstract.

FTDNA FF to PED Conversion

Posted by Zack on June 23, 2012 Comments Off

Someone asked about how to convert a FTDNA Family Finder csv data file to the Plink format. I threw together a very simple Unix script to do that and I am sharing it here:

#!/bin/bash
if test -z "$1"
then
        echo "FTDNA raw data filename not supplied as argument."
        exit 0
fi
echo "Family ID: "
read fid
echo "Individual ID: "
read id
echo "Paternal ID: "
read pid
echo "Maternal ID: "
read mid
echo "Sex (m/f/u): "
read sexchr
if [[ $sexchr == m* ]]
then
        sex=1
elif [[ $sexchr == f* ]]
then
        sex=2
else
        sex=0
fi
pheno=0
 
echo "$fid $id $pid $mid $sex $pheno" > $id.tfam
 
dos2unix $1
sed '1d' $1 > $id.nocomment
awk -F, '{gsub(/"/,""); print $2,$1,"0",$3,substr($4,1,1),substr($4,2,1)}' $id.nocomment > $id.tped
rm $id.nocomment
 
plink --tfile $id --out $id --make-bed --missing-genotype - --output-missing-genotype 0

This script creates three files: *.bed, *.bim and *.fam, which are the binary format files for Plink. You can then use Plink to merge multiple files, filter SNPs or individuals and do other processing.

HarappaWorld Ancestral South Indian

Posted by Zack on June 19, 2012 12 comments

Using the same method as I used for reference 3 admixture, I decided to guesstimate the Ancestral South Indian proportions, as given by Reich et al, for my HarappaWorld admixture run.

Basically, I used the 92 (out of the 96 samples Reich et al used) to find population averages for the South Indian component. Then, I used linear regression between the South Indian component average and Reich et al's estimate of Ancestral South Indian (ASI) ancestry. Since Reich et al actually list Ancestral North Indian percentages in their paper but their model is a two-ancestry ANI+ASI one, I simply calculated the ASI percentages as 100% minus ANI.

The correlation between Reich et al ASI and my HarappaWorld South Indian component for the relevant populations turns out to be 0.99277086.

And the linear regression fit for the data is:

ASI = 2.5218942 + 0.8104836 * S_INDIAN

where both ASI (Reich et al) and S_INDIAN (HarappaWorld) are given in percentages.

Of the individuals in HarappaWorld, I kept only those who had a South Indian component of at least 20% for computing the ASI proportions.

The resulting ASI percentages can be seen in a spreadsheet.

Please note that in the Group sheet, the averages are based on the samples which met the 20% South Indian component threshold. Thus, the 20% ASI in the Romanians is the average of the two Romanians who met the threshold out of a total of 16 Romanian samples.

The individual results are available in the Individual sheet. These results are a little different from the estimates using reference 3. Thus, I would point out that these should be taken only as a rough estimate.

HarappaOracle Limitations

Posted by Zack on June 12, 2012 10 comments

While HarappaOracle is a great tool, it has its limitations.

First of all, do not think of the mixed mode results as showing which populations you are descended from. Use HarappaOracle to get an idea of which populations are similar to you in their admixture results. This function is especially important since admixture results should be understood in relative terms, as I have been stressing.

Sometimes, for mixed-race people, the Oracle might sometimes provide a correct result like it does for me. For others, the known ancestral mix might not show up.

There is also the fact that the Oracle calculator is sensitive to your admixture percentages and sometimes small changes can change the Oracle mixed mode results radically.

Let's look at three siblings as an example. (My thanks to them for letting me use their results for this post.) Here are their admixture results:

	Sibling 1	Sibling 2	Sibling 3
NE Euro	43.8%	43.1%	43.9%
Mediterranean	27.6%	26.8%	27.0%
Baloch	11.2%	11.1%	12.2%
Caucasian	9.4%	10.8%	8.6%
S Indian	5.3%	6.5%	7.0%
SW Asian	1.5%	1.0%	0.5%
American	0.7%	0.3%	0.7%
NE Asian	0.4%	0.1%	0.1%
Beringian	0.0%	0.3%	0.0%
San	0.0%	0.1%	0.0%

Their admixture results are broadly similar, as expected. Some of you might think that 1% less or difference is very significant, but do consider what we know of DNA inheritance and the error margins in ADMIXTURE.

Now let's see their HarappaWorld Oracle results.

Sibling 1		Sibling 2		Sibling 3
romany	3.16	romany	2.85	romany	4.45
hungarian	9.47	hungarian	9.65	utahn-white	10.74
utahn-white	9.88	french	11.05	n-european	10.79
n-european	9.9	slovenian	11.09	hungarian	10.97
french	9.95	n-european	11.38	utahn-white	11.35
utahn-white	10.62	utahn-white	11.54	french	11.54
slovenian	10.95	utahn-white	12.21	british	11.94
british	11.29	british	12.99	slovenian	12.21
orcadian	13.17	orcadian	14.86	orcadian	13.51
ukranian	17.73	romanian	17.14	ukranian	18.24

Again, not unexpected. The top 10 population matches are not too different for the siblings. There are some differences, but nothing extraordinary.

Finally let's look at mixed mode Oracle, where we try to find the 10 closest matches (based on admixture results) assuming that these individuals are mixed from two populations.

Sibling 1		Sibling 2		Sibling 3
91.3% romany + 8.7% lithuanian	1.58	93.3% romany + 6.7% lithuanian	1.93	82.8% utahn-white + 17.2% bene-israel	1.99
78.4% romany + 21.6% n-european	1.67	95.4% romany + 4.6% finnish	1.99	83.7% n-european + 16.3% bene-israel	2.37
79.7% romany + 20.3% utahn-white	1.69	91.9% romany + 8.1% belorussian	2.01	83.8% utahn-white + 16.2% bene-israel	2.47
83.3% romany + 16.7% orcadian	1.76	92.4% romany + 7.6% russian	2.02	84.8% n-european + 15.2% cochin-jew	2.52
94.2% romany + 5.8% finnish	1.88	92.0% romany + 8.0% mordovian	2.06	86.0% n-european + 14.0% kerala-christian	2.93
79.8% romany + 20.2% utahn-white	1.99	90.4% romany + 9.6% ukranian	2.14	85.0% utahn-white + 15.0% cochin-jew	2.98
90.2% romany + 9.8% belorussian	2.00	88.7% romany + 11.3% slovenian	2.5	85.9% n-european + 14.1% ap-hyderabad	2.99
82.1% romany + 17.9% british	2.03	89.2% romany + 10.8% n-european	2.52	85.1% n-european + 14.9% up	3.02
91.4% romany + 8.6% russian	2.06	95.7% romany + 4.3% chuvash	2.58	85.5% n-european + 14.5% tn-brahmin	3.13
85.0% n-european + 15.0% bene-israel	2.16	90.7% romany + 9.3% utahn-white	2.58	85.5% n-european + 14.5% brahmin-tamil-nadu	3.15

Sibling 1 and Sibling 2 are again not too different from each other: Mostly Romany with some European. However, Sibling 3 is getting vastly different results. Why? No, Sibling 3 wasn't adopted! The reason is simple. Sibling 3 has more South Indian component than the average Romany in our dataset. This means that (s)he cannot be represented as a mix of Romany and a European ethnicity without a large error. Instead mostly northwest European and a little bit of Indian, especially Indian Jewish, seem to be closest to her results. However, this does not make her Jewish or Indian Jewish (who are quite mixed with the local Indian populations).

HarappaWorld HRP0240-HRP0244

Posted by Zack on June 5, 2012 5 comments

From now on, instead of waiting till I have a batch of 10 new participants to compute their Admixture results, I'll run admixture at the start of the month for those who submitted their data during the previous month.

So I have added the HarappaWorld Admixture results for HRP0240-HRP0244 to the individual spreadsheet.

I have also recomputed the weighted averages for Bengalis (from 3 to 5 now), Kerala Muslims (from 1 to 2), and Georgians (from 3 to 4) while adding a new one for our first North Ossetian participant.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: June 2012

ANI-ASI Admixture Dating

FTDNA FF to PED Conversion

HarappaWorld Ancestral South Indian

HarappaOracle Limitations

HarappaWorld HRP0240-HRP0244

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: June 2012

ANI-ASI Admixture Dating

Share this:

FTDNA FF to PED Conversion

Share this:

HarappaWorld Ancestral South Indian

Share this:

HarappaOracle Limitations

Share this:

HarappaWorld HRP0240-HRP0244

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll