Monthly Archives: May 2011 - Page 3

Reference I Admixture Errors

Posted by Zack on May 6, 2011 Comments Off

I am have thinking about error estimation for Admixture results for some time since I have heard a lot of arguments about how even 0.1% result is significant. I was skeptical of that and have rounded off my admixture run results to the nearest percent.

There was a memory leak issue in the bootstrapping code for admixture which crashed it every time I tried running it. I emailed David Alexander and he fixed it in version 1.12.

So I ran the default 200 bootstrap replicates to measure standard error in our old Reference I K=12 admixture. Spreadsheet with population level results is here and participant results are here.

Here are some statistics for the standard error estimates:

	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
C1 S Asian	0.00%	0.02%	0.33%	0.52%	0.96%	1.93%
C2 Blch/Cauc	0.00%	0.00%	1.02%	0.79%	1.45%	2.63%
C3 Kalash	0.00%	0.01%	0.40%	0.50%	0.99%	3.76%
C4 SE Asian	0.00%	0.09%	0.37%	0.60%	1.27%	1.92%
C5 SW Asian	0.00%	0.00%	0.60%	0.66%	1.28%	2.90%
C6 Euro	0.00%	0.00%	0.35%	0.56%	1.12%	1.82%
C7 Papuan	0.00%	0.07%	0.22%	0.23%	0.36%	1.08%
C8 NE Asian	0.00%	0.07%	0.36%	0.67%	1.36%	2.45%
C9 Siberian	0.00%	0.08%	0.37%	0.51%	0.82%	2.29%
C10 E Bantu	0.00%	0.00%	0.00%	0.35%	0.72%	1.93%
C11W Afr	0.00%	0.00%	0.00%	0.28%	0.50%	1.51%
C12 E Afr	0.00%	0.00%	0.05%	0.31%	0.60%	1.79%

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

And statistics for bias estimates:

	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
C1 S Asian	-1.104%	-0.031%	0.000%	-0.024%	0.075%	1.026%
C2 Blch/Cauc	-0.835%	-0.280%	-0.009%	-0.133%	0.000%	1.049%
C3 Kalash	-1.575%	0.000%	0.020%	0.076%	0.147%	0.615%
C4 SE Asian	-0.629%	-0.021%	0.011%	0.018%	0.087%	0.478%
C5 SW Asian	-0.691%	-0.094%	0.000%	-0.020%	0.035%	0.613%
C6 Euro	-0.572%	-0.086%	0.000%	-0.039%	0.004%	0.468%
C7 Papuan	-0.171%	0.008%	0.059%	0.070%	0.120%	0.312%
C8 NE Asian	-0.739%	0.000%	0.016%	0.034%	0.107%	0.679%
C9 Siberian	-1.044%	0.000%	0.015%	0.035%	0.103%	0.692%
C10 E Bantu	-0.412%	0.000%	0.000%	-0.007%	0.001%	0.370%
C11 W Afr	-0.261%	0.000%	0.000%	0.009%	0.005%	0.304%
C12 E Afr	-0.635%	0.000%	0.000%	-0.017%	0.010%	0.405%

You can also see the average value of the bias in each ancestral component for each population.

Since the bias is lower than the standard error and distributed around zero, if a large number of samples of a population group have some small percentage of an ancestral component, the likelihood of that not being noise is higher.

Reference 3F(iltered) Admixture

Posted by Zack on May 5, 2011 19 comments

I removed all American populations and San and Pygmy (i.e., South and Central African) from Reference 3 for a better focus on our target populations.

Here are the admixture results. You can choose the number of ancestral components, K, from the dropdown below.

K=13, 14, 15 (in that order) have the lowest cross-validation error.

There's a bunch of interesting results in there. For example, the split into northern and southern European, and the split of Siberian into Siberian and Russian Far East (or Bering Strait). However, the Onge component as a proxy of the ASI does not appear. Also, we don't get much breakdown of the South Asian populations as we would like.

Harappa Nearest IBS Neighbors

Posted by Zack on May 4, 2011 4 comments

After a long tease, here is the spreadsheet containing the top 500 nearest neighbors (using IBS similarity percentages) for the Harappa participants from HRP0001 to HRP0089.

I am also providing an R data object with the same data (except it contains all the 3,975 individual from reference 3 and Harappa). To use this data,

Download R
Install R on your computer
When you start R, type
```
load('harappa_ibs.RData')
```
to load the data
Type
```
closest("HRP0001")
```
to find the 20 closest IBS neighbors of HRP0001. You can use any of the Harappa IDs here.
You can set the number of IBS neighbors (50, for example) to show using
```
closest("HRP0010",50)
```

Enjoy!

100!

Posted by Zack on May 3, 2011 13 comments

Yesterday, we got to 100 participants in the Harappa Ancestry Project.

I made the project public on January 17, 2011. So, 100 submissions in 106 days. That's pretty good.http://ceoec.ru/

I am surprised at the speed and quantity of submissions. I probably have the largest dataset of South Asians right now.

Keep spreading the word and encouraging everyone to participate.

Accepting FTDNA Family Finder

Posted by Zack on May 2, 2011 Comments Off

In addition to 23andme data, I am now accepting the autosomal data from FTDNA Family Finder too.

This is due to the recent switch to Illumina Omni chip by FamilyTreeDNA which has a lot more markers in common with the 23andme data.

Since FTDNA is retesting all its current customers on the new chip, even if you tested with them earlier, you should have autosomal data from the new chip which you can download and email to me at harappa@zackvision.com.

I am basically looking for participants who have at least some ancestry from the following countries/regions:

Afghanistan
Bangladesh
Bhutan
Burma
India
Iran
Maldives
Nepal
Pakistan
Sri Lanka
Tibet

But if you have ancestry from West or Central Asia or Caucasus, I am likely to accept your data too.

Details of participation are here.

April Update

Posted by Zack on May 1, 2011 5 comments

I have a total of 97 participants in the project right now who have sent me their raw data. Six of those have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.http://mountainsphoto.ru

The following groups are represented:

Let's try to get to hundred soon.

And yes, I am accepting FTDNA Family Finder (new Illumina chip) now.

Ref3 + Harappa Maps

Posted by Zack on May 1, 2011 Comments Off

More maps from The Jatt Gene using the Reference 3 and Harappa participants K=11 admixture results.

C1 South Asian Isopleth

C2 Onge Isopleth

C1 South Asian Chloropleth at state/province level

C2 Onge Chloropleth

As usual, Simranjit has more maps on his blog.

« Previous page

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: May 2011 - Page 3

Reference I Admixture Errors

Reference 3F(iltered) Admixture

Harappa Nearest IBS Neighbors

100!

Accepting FTDNA Family Finder

April Update

Ref3 + Harappa Maps

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Monthly Archives: May 2011 - Page 3

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll