Tag Archives: dendrogram

Ref3 Admixture Dendrograms

I have posted the reference 3 K=11 admixture results for all populations and datasets. Here are the relevant links:

So let's try a dendrogram of all these populations' average admixture results. Instead of using regular Euclidean distance, I used some weighting based on Fst distances between admixture components, very similar to what Palisto did.

Here's a dendrogram of all datasets using complete linkage.

Since the Pan-Asian dataset had only 5,400 SNPs common with reference 3, we need to be careful interpreting the tree above. Just to make sure, here's the dendrogram excluding Pan-Asian populations.

Related Reading:

Detecting biophysical properties of a semi-arid grassland and distinguishing burned from unburned areas with hyperspectral reflectance [An article from: Journal of Arid Environments]
Programming Collective Intelligence: Building Smart Web 2.0 Applications
Analyzing Animal Societies: Quantitative Methods for Vertebrate Social Analysis
Cluster Analysis (Wiley Series in Probability and Statistics)
Merriam-Webster's Everyday Language Reference Set

Interactive Tree Generation

Anyone know of any software to generate a javascript (or something) tree/dendrogram for the web which is interactive, i.e. branches can be expanded and collapsed and one can search for different nodes.

I want to use it to generate dendrograms including all Harappa participants and individual reference samples. So we are looking at more than 4,000 nodes on the tree.

Related Reading:

Deteriogenic cyanobacteria on historic buildings in Brazil detected by culture and molecular techniques [An article from: International Biodeterioration & Biodegradation]
Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics (Advanced Information and Knowledge Processing)
JavaScript: The Definitive Guide: Activate Your Web Pages (Definitive Guides)
Tree: A Life Story
Chemometric selection of a small set of pharmaceutical active substances used in determining the orthogonality and similarity of chromatographic systems [An article from: Analytica Chimica Acta]

Admixture Ref3 Dendrogram HRP0001-HRP0160

I haven't done any admixture dendrograms in a while, so I thought you guys might be interested.

This uses admixture results using Reference 3. As usual, I used complete linkage for the hierarchical clustering.

Let's look at the dendrogram using regular Euclidean distance measure between admixture results.

I also decided to use chi squared distance measure to do the clustering.

PS. Any thoughts on the trees based on two different distance measures?

Related Reading:

Detecting biophysical properties of a semi-arid grassland and distinguishing burned from unburned areas with hyperspectral reflectance [An article from: Journal of Arid Environments]
Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics (Advanced Information and Knowledge Processing)
The Harappa Files

Reference 3 + HAP PCA

I had run PCA on the Reference 3 dataset before. Now I included Harappa participants in it as well.

Here's the dendrogram based on the Euclidean distance between Harappa participants in their PCA results.

Since no one liked the IBS nearest neighbors lists, I thought making a spreadsheet with every participants' closest 100 neighbors in PCA space might be more fruitful.

Note that for the Harappa participants, the median distance to their nearest neighbor is 0.2064. So if your nearest neighbor is more than let's say 0.3 away, then you are not close to anyone.

Related Reading:

Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics (Advanced Information and Knowledge Processing)
Analyzing Animal Societies: Quantitative Methods for Vertebrate Social Analysis
Stormbound: Seduced by the Neighbors
Naughty Neighbors
NEIGHBOR'S LITTLE TEASE: Sexy Erotic Story #9 (XXX Stories)

Harappa Ref3 Admixture Dendrograms

Now that we have the admixture results for project participants using Reference 3, let's take a look at a tree based on Euclidean distance of the admixture proportions for each participant.

Compare it to the earlier one with reference 1 admixture results.

And here is a dendrogram combining the average reference population results with the Harappa participants.

Related Reading:

From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)
Cluster Analysis (Wiley Series in Probability and Statistics)
Script of Harappa & Mohenjodaro & Its Connection With Other Scripts

Reference 3 K=11 Admixture Dendrogram

Laredo asked:

Is it possible for you to create an unrooted similarity tree of all the populations in your “Reference 3″ dataset?

So here's a dendrogram of the average K=11 admixture results for the reference 3 populations.

Related Reading:

The Interpretation of Ecological Data: A Primer on Classification and Ordination
The Handy Cyclopedia of Things Worth Knowing A Manual of Ready Reference
Detecting biophysical properties of a semi-arid grassland and distinguishing burned from unburned areas with hyperspectral reflectance [An article from: Journal of Arid Environments]
Merriam-Webster's Everyday Language Reference Set
Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics (Advanced Information and Knowledge Processing)

Harappa Genome Similarity MDS/Dendrogram

I computed the IBS similarity matrix for the Harappa participants HRP0001 to HRP0080 over 500,000 SNPs. This is exactly the same thing as the genome-wide gene comparison at 23andme.

Then, I converted the similarity matrix to a dissimilarity/distance matrix with the standard formula:

dij = sqrt(2 - 2 * sij)

where sij is the similarity between individuals i and j and dij is the distance/dissimilarity between the two.

Using the dissimilarity matrix, I classified all the participants (excluding close relatives) using hierarchical clustering with complete linkage. You can see the dendrogram below.

Then I used the same dissimilarity matrix to calculate 6-dimensional MDS. You can see the MDS plots below. The numbers on the plots are your Harappa IDs.

MDS Dimensions 1 & 2:

MDS Dimensions 3 & 4:

MDS Dimensions 5 & 6:

As you can see I (HRP0001) and my sister (HRP0035) are far away in the first four dimensions.

I'll let you guys speculate on what each dimension represents.

Now why create an MDS this way instead of directly using Plink's MDS functionality? Well, I needed to check if I could do it using only the similarity matrix because that would be really useful for something else. Tune in on my other blog for more later this week.

Related Reading:

The First Year: IBS (Irritable Bowel Syndrome)--An Essential Guide for the Newly Diagnosed
Cluster Analysis (Wiley Series in Probability and Statistics)
The Interpretation of Ecological Data: A Primer on Classification and Ordination
Analyzing Animal Societies: Quantitative Methods for Vertebrate Social Analysis

Harappa Admixture Dendrogram 1-80

It's time to update the Harappa admixture dendrogram since the last time I created one we had only 40 participants.

Note that this is not a phylogeny. It just visualizes the closeness of your admixture results to others.

Also note that I am using Euclidean distance between admixture proportions which has problems and using complete linkage hierarchical clustering.

Related Reading:

Script of Harappa & Mohenjodaro & Its Connection With Other Scripts
India Divided Religion 'Then' (1947) (East-West): 'Now' What Languages ( North-South ) ?....
The Interpretation of Ecological Data: A Primer on Classification and Ordination
From Harappa to Hastinapura: A Study of the Earliest South Asian City and Civilization (American School of Prehistoric Research Monograph Series)

Ref1 South Asians + Harappa PCA Clusters

Using the PCA results of the South Asians in Reference I as well as Harappa participants, I ran a couple of clustering algorithms.

First, I scaled the principal components by the respective eigenvalues.

Using Euclidean distance for hierarchical clustering with complete linkage, here's the dendrogram for the Harappa Project participants.

You can compare this to the Admixture-based dendrogram:

The most obvious thing is that I (HRP0001) am an outlier by far.

We inferred three major clusters with the admixture results. Those are intact, though changed a little.

I also ran MClust on the PCA data. The optimum number of clusters was 14. The resulting cluster assignments can be seen in a spreadsheet.

For the Harappa Project participants, the numbers give the probability of assignment to a cluster. For example, for HRP0009 there is a 72% of belonging to cluster 4. For the reference populations, the numbers give the expected number of samples assigned to a cluster.

Related Reading:

Cluster Analysis (Wiley Series in Probability and Statistics)
The Interpretation of Ecological Data: A Primer on Classification and Ordination
Hot Sour Salty Sweet: A Culinary Journey Through Southeast Asia
Ancient Cities of the Indus Valley Civilization
Southeast Asia: A Concise History

Harappa and Reference I Dendrograms

Looking at the Harappa dendrogram and the dendrogram for reference I, I thought I would combine them to see where our project participants fit.

Then I got more curious. I wanted to see a similarity tree of all the samples in reference I (2,654) plus the 40 Harappa participants I have processed till now. That came out to be such a huge tree it was impossible to save it in a way to be legible. Finally I compromised by selecting only the South Asian samples from the Reference I dataset and putting them together with the Harappa data. Unfortunately, that doesn't give the Iranian and European-admixed participants any information. I'll have to analyze those separately.

Anyway, here's the South Asian Admixture Dendrogram in PDF format. That means you can search for "HRP" to find all the project members, which is why I like PDF in this case better than an image.

Note that Singapore Indians are such a good stand-in for South Indians.

Related Reading:

Indus Valley Painted Pottery - A Comparative Study Of The Designs On The Painted Wares Of The Harappa Culture
Merriam-Webster's Everyday Language Reference Set
The New York Times Guide to Essential Knowledge, Second Edition: A Desk Reference for the Curious Mind
Pocket Ref 4th Edition
Ugly's Electrical References, 2011 Edition