Harappa Genome Similarity MDS/Dendrogram

I computed the IBS similarity matrix for the Harappa participants HRP0001 to HRP0080 over 500,000 SNPs. This is exactly the same thing as the genome-wide gene comparison at 23andme.

Then, I converted the similarity matrix to a dissimilarity/distance matrix with the standard formula:

dij = sqrt(2 - 2 * sij)

where sij is the similarity between individuals i and j and dij is the distance/dissimilarity between the two.

Using the dissimilarity matrix, I classified all the participants (excluding close relatives) using hierarchical clustering with complete linkage. You can see the dendrogram below.

Then I used the same dissimilarity matrix to calculate 6-dimensional MDS. You can see the MDS plots below. The numbers on the plots are your Harappa IDs.

MDS Dimensions 1 & 2:

MDS Dimensions 3 & 4:

MDS Dimensions 5 & 6:

As you can see I (HRP0001) and my sister (HRP0035) are far away in the first four dimensions.

I'll let you guys speculate on what each dimension represents.

Now why create an MDS this way instead of directly using Plink's MDS functionality? Well, I needed to check if I could do it using only the similarity matrix because that would be really useful for something else. Tune in on my other blog for more later this week.

12 Comments.

  1. Labeled-
    http://i55.tinypic.com/169gsvd.jpg

    Excellent work by the way. Interestingly, I am closer to the Hebbar (Karnataka) Iyengars - HRP17 and HRP79 than I am to HRP48, who is an Iyengar from Tamil Nadu who belongs to the same Vadagalai sub-sect, as well. I've taken after my maternal side, it seems?

    And as ever, you (Zack) and Azad are outliers. Also, who is HRP45? Couldn't find him/her in the spreadsheet.

  2. Any plans to do an IBD run, Zack?

    • Probably not. I have done IBD analyses for the reference data and posted the results. I am not sure if participants would appreciate an IBD run on their data.

      • Zack,

        You could give participants an opt in option. I (HRP0003) have no problem. This is my top five from DOD:
        V194 "1" "DOD220" "1"
        V1047 "2" "GIH25" "0.746819"
        V1002 "3" "Pathan" "0.746385"
        V362 "4" "DOD395" "0.746322"
        V1070 "5" "GIH25" "0.745879"
        V1013 "6" "Pathan" "0.745703"

        Our match - "0.73583"

  3. Dienekes did one today. It'd be great if you could do so too, since the number of South Asian participants far exceed that of Dodecad's.