I computed the IBS similarity matrix for the Harappa participants HRP0001 to HRP0080 over 500,000 SNPs. This is exactly the same thing as the genome-wide gene comparison at 23andme.

Then, I converted the similarity matrix to a dissimilarity/distance matrix with the standard formula:

d

_{ij}= sqrt(2 - 2 * s_{ij})

where s_{ij} is the similarity between individuals *i* and *j* and d_{ij} is the distance/dissimilarity between the two.

Using the dissimilarity matrix, I classified all the participants (excluding close relatives) using hierarchical clustering with complete linkage. You can see the dendrogram below.

Then I used the same dissimilarity matrix to calculate 6-dimensional MDS. You can see the MDS plots below. The numbers on the plots are your Harappa IDs.

As you can see I (HRP0001) and my sister (HRP0035) are far away in the first four dimensions.

I'll let you guys speculate on what each dimension represents.

Now why create an MDS this way instead of directly using Plink's MDS functionality? Well, I needed to check if I could do it using only the similarity matrix because that would be really useful for something else. Tune in on my other blog for more later this week.

