Monthly Archives: February 2011

Admixture K=12, HRP0001 to HRP0040

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results.

In case you guys are wondering, the new thing here are the results fro HRP0031 to HRP0040.

If you can't see the interactive charts above, Javascript might be disabled on your browser. Here's a static image for HRP0031 to HRP0040 admixture run.

PS. This was run using Admixture version 1.04.

Related Reading:

The Everything Guide to Online Genealogy: Use the Web to trace your roots, share your history, and create a family tree (Everything Series)
Quicksheet Citing Ancestry.com Databases & Images
Start & Run a Personal History Business: Get Paid to Research Family Ancestry and Write Memoirs
Indus Valley Painted Pottery - A Comparative Study Of The Designs On The Painted Wares Of The Harappa Culture

Fst for Reference I Admixture K=12

I had posted the Fst divergences between the estimated ancestral populations for the admixture analysis on Reference I dataset. But a picture is worth a thousand words and this dendrogram (using complete linkage) shows the Fst numbers fairly clearly.

Remember this is not a phylogeny.

Related Reading:

FST TCS 2003: Foundations of Software Technology and Theoretical Computer Science
Legends of the middle ages, narrated with special reference to literature and art
Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics (Advanced Information and Knowledge Processing)
FST TCS 2001: Foundations of Software Technology and Theoretical Computer Science: 21st Conference, Bangalore, India, December 13-15, 2001, Proceedings (Lecture Notes in Computer Science)
The Foolish Dictionary An exhausting work of reference to un-certain English words, their origin, meaning, legitimate and illegitimate use, confused by a few pictures [not included]

Admixture K=9, HRP0001 to HRP0040

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results.

In case you guys are wondering, the new thing here are the results fro HRP0031 to HRP0040.

PS. This was run using Admixture version 1.04.

Related Reading:

Who Do You Think You Are?: The Essential Guide to Tracing Your Family History
How to Do Everything Genealogy
The Seven Daughters of Eve: The Science That Reveals Our Genetic Ancestry
Ancient Cities of the Indus Valley Civilization
Start & Run a Personal History Business: Get Paid to Research Family Ancestry and Write Memoirs

Reference I Dendrogram

Handschar created a dendrogram using a hierarchical classifier based on K=12 admixture results and wondered:

When I run a classification based on simple euclidean distances (not a phylogeny), the Armenians and Turks, as they were, prior to the removal of the four North European admixed Behar samples in David's runs, cluster together. The North European component, in Dodecad Armenians, is practically nonexistent. I am not sure how the Harappa project "European" component translates to Dodecad components. If the admixed Armenians are included, it is possible their inclusion is impacting the Armenian population component percentages. Then again, even if included, perhaps your runs are picking up on something not previously detected. The Armenians, in previous classification runs, ordinarily matched one or more of the Caucasian Jewish groups.

While looking into his question, I figured that I would create some dendrograms too. The ones here are based on the K=12 admixture results of Reference I dataset (spreadsheet). Also, I am using the pairwise Euclidean distance of the Admixture results between population groups to do a complete linkage hierarchical classification. So these dendrograms show which groups are closest in terms of their admixture percentages and do not show shared ancestry. In other words, it is not a phylogeny or a family tree.

First, I used the mean admixture percentages for each group, as given in the spreadsheet.

Reference 1 Mean Admixture Complete Linkage Dendrogram

There are a number of outliers in the dataset. For example, some Arabs and Sindhis with African admixture, some Armenians with a lot more European component than the rest, etc. Therefore, I thought a better approach would be to do the same classification using the median admixture percentages for each population group.

Reference 1 Median Admixture Complete Linkage Dendrogram

Using the median sample from each population, handschar was correct that the Armenians match the Caucasian Jewish groups.

UPDATE: Here's another dendrogram in which I take the mean of the ancestral components for each population after removing outliers.

Reference 1 Mean (No Outliers) Admixture Complete Linkage Dendrogram

Again, don't take these dendrograms to heart. All they show is the distance between the admixture results of different populations.

Related Reading:

Ugly's Electrical References, 2011 Edition
Deteriogenic cyanobacteria on historic buildings in Brazil detected by culture and molecular techniques [An article from: International Biodeterioration & Biodegradation]
The New York Times Guide to Essential Knowledge, Second Edition: A Desk Reference for the Curious Mind
Cluster Analysis (Wiley Series in Probability and Statistics)
Chemometric selection of a small set of pharmaceutical active substances used in determining the orthogonality and similarity of chromatographic systems [An article from: Analytica Chimica Acta]

Admixture K=4, HRP0001-HRP0040

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results.

In case you guys are wondering, the new thing here are the results fro HRP0031 to HRP0040.

PS. This was run using Admixture version 1.04.

Related Reading:

The Harappa Files
Ancient Cities of the Indus Valley Civilization
Deep Ancestry: Inside The Genographic Project
Who Do You Think You Are?: The Essential Guide to Tracing Your Family History

Improved Admixture Bar Charts

I have improved the Admixture bar charts further. As per your demands, ethnicity information is now available in a table right below the bar plot, in the same order as the bar plot IDs.

Also, you can click on any of the legend color rectangles on the right to sort the bar chart and the table by that ancestral component. Similarly, click on the header row of the table to sort by a column.

I might make some minor tweaks to this one.

Related Reading:

What Would Google Do?
Stikky Stock Charts: Learn the 8 major chart patterns used by professionals and how to interpret them to trade smart--in
Understanding Harappa
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
In The Plex: How Google Thinks, Works, and Shapes Our Lives

Admixture K=12, HRP0021-HRP0030

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results and this batch's results at lower K.

Batch 3 Admixture K=12

If you guys can confirm that the interactive bar chart is working well for you, then this is the last static bar plot.

PS. This was run using Admixture version 1.04.

Related Reading:

Deep Ancestry: Inside The Genographic Project
The Everything Guide to Online Genealogy: Use the Web to trace your roots, share your history, and create a family tree (Everything Series)
Who Do You Think You Are?: The Essential Guide to Tracing Your Family History
The Harappa Files

Google Charts

Here's a chart using Google Visualization API.

In case you are wondering, the individuals are ordered by the sum of their South Asian, Pakistan/Caucasian and Kalash component percentages.

If it works well for everyone, using Internet Explorer, Firefox, Chrome, or Safari on Windows, Linux, iOS or Mac OS, then I'll start using these interactive bar charts instead of the ones I have been creating in R. These just use the data from the spreadsheet directly.

I am also looking into interactive scatter plots for the PCA plots, but I am not sure if it will handle a lot of data points without running your computer into the ground.

The Google Visualization API also has a geographical map feature using flash. There is also a static map chart which I am looking into.

Related Reading:

The Google Way: How One Company Is Revolutionizing Management as We Know It
What Would Google Do?
Stikky Stock Charts: Learn the 8 major chart patterns used by professionals and how to interpret them to trade smart--in
Google Apps: The Missing Manual
The New Google SEO (Search Engine Optimization): What You Need To Be Successful with Google Panda

Admixture K=12, HRP0011-HRP0020

Here are their ethnic backgrounds and the results spreadsheet. Also relevant are the reference I admixture results and this batch's results at lower K.

Batch 2 Admixture K=12

PS. This was run using Admixture version 1.04.

Related Reading:

The Harappa Files
The Everything Guide to Online Genealogy: Use the Web to trace your roots, share your history, and create a family tree (Everything Series)
India Divided Religion 'Then' (1947) (East-West): 'Now' What Languages ( North-South ) ?....
Understanding Harappa
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts

Admixture Upgrade

I noticed a few days ago that Admixture had an update available:

1.1 (2/8/2011): Parallel processing, supervised analysis. Minor speedups and cleanups.

There were two important new features in version 1.1 that I started salivating over. One was parallel processing so I could utilize all the cores of my machine and thus run Admixture faster. The other was more important though I have yet to experiment with it. It's the ability to assign some ancestral components to specific samples, i.e. assign some individuals in the data specific 100% ancestry as a starting assumption and calculate admixture from that.

Of course, these two features made me forget the cardinal rule: Never upgrade in the middle of an analysis. But I did upgrade and things have changed subtly, making some comparisons between admixture v1.04 and v1.1 difficult.

For example, previously (admixture v1.04), at K=12, admixture was giving me the ancestral components: South Asian, Balochistan/Caucasus, Kalash, Southeast Asian, Southwest Asian, European, Papuan, Northeast Asian, Siberian, East African Bantus, West African, and East African.

With Admixture v1.1, I am getting the ancestral components: South Asian, Balochistan/Caucasus, Kalash, Southeast Asian, European, Mediterranean (maximum among Mozabite and Sardinians), Papuan, Northeast Asian, Southwest Asian, Siberian, West African, and East African.

So now I am running Admixture with different random seeds and trying to compare the old version results vs the new. Of course since we are talking K=12, just one admixture run takes a whole day.

Anyway, while that's going on, I have more things in process which can go forward, like reporting the results of Batch 4. And working on the Eurasian dataset.

Related Reading:

Computer Snafus: Crashes, Erros, Failures, Foul-Up, Goofs, Glitches, and Other Malfunctions That Cause Computers to Go Awry
How to install and upgrade your PS3 hard drive
SNAFU'd
Sticking with Windows XP...or Not? Why You Should or Why You Should Not Upgrade to Windows 7 (Windows Tips and Tricks)
This Old House Easy Upgrades: Kitchens: Smart Design, Trusted Advice