Harappa Oracle

Based on the Dodecad Oracle, here is Harappa Oracle using reference 3 admixture results.

I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components. Because of that it is several times slower than DodecadOracle. I plan to offer an option soon to switch between Euclidean distance and Fst-weighted distance.

You need to install R to use it. Then unzip the Oracle zip file. Double-click on the file or use the following in R:

load('HarappaOracleR3fst.RData')

In R, you can look at the 385 populations included by typing:

X[,1]

To use it to find your closest populations, you need your Harappa Reference 3 admixture results. Use them separated by commas like this (for me):

HarappaOracle(c(44,12,0,24,14,1,2,0,0,1,2))

You will get a result, with the first column showing the closest populations and the 2nd column their distance to you.

[,1] [,2]
[1,] "balochi" "8.0242"
[2,] "bene-israel" "9.2843"
[3,] "brahui" "9.5158"
[4,] "pathan" "9.7034"
[5,] "makrani" "10.1014"
[6,] "sindhi" "10.9236"
[7,] "Bhatia" "11.8441"
[8,] "Sindhi" "12.1704"
[9,] "Kashmiri" "13.4229"
[10,] "punjabi-arain" "13.9192"

You can also find out the closest populations to one of the reference populations:

HarappaOracle("punjabi-arain")

By default, the Oracle shows the 10 closest populations. You can change that:

HarappaOracle("punjabi-arain",k=20)

Also, by default, the Oracle excludes the Pan-Asian dataset since the overlap is only 5,400 SNPs. You can include Pan-Asian populations:

HarappaOracle("punjabi-arain",panasian=T)

There is also a mixed mode where the individual (or mean reference population) is compared against all pairs of populations as ancestors.

HarappaOracle("Haryana Jatt",mixedmode=T)

which has the following output:

[1,] "Haryana Jatt" "0"
[2,] "15.4% lithuanians + 84.6% Punjabi Brahmin" "1.9553"
[3,] "10.6% russian + 89.4% Rajasthani Brahmin" "2.0626"
[4,] "14.7% finnish + 85.3% Punjabi Brahmin" "2.0863"
[5,] "9.2% finnish + 90.8% Rajasthani Brahmin" "2.1142"
[6,] "89.4% Rajasthani Brahmin + 10.6% mordovians" "2.1727"
[7,] "9.6% lithuanians + 90.4% Rajasthani Brahmin" "2.1989"
[8,] "10.1% belorussian + 89.9% Rajasthani Brahmin" "2.2938"
[9,] "16.8% russian + 83.2% Punjabi Brahmin" "2.3015"
[10,] "16.2% belorussian + 83.8% Punjabi Brahmin" "2.3656"

You can of course combine any or all of the options.

Think of Harappa Oracle as a tool to help you interpret your admixture results by comparing who you are closest to. Do not think of it as giving you your real ancestry.

15 Comments.

  1. Zack,
    Could you explain the results below as to how to interpret the difference between HRP0003 and "Bihari Brahmin" mixed mode results.
    Thanks.

    For HRP0003
    [1,] "Bihari Brahmin" "0.7657"
    [2,] "UP Brahmin" "3.8766"
    [3,] "Punjabi Brahmin" "3.9242"
    [4,] "Punjabi" "4.6289"
    [5,] "Rajasthani Brahmin" "5.2604"
    [6,] "kashmiri-pandit" "5.3077"
    [7,] "Brahmins_from_Uttar_Pradesh" "6.0628"
    [8,] "Kashmiri" "6.372"
    [9,] "Bengali Brahmin" "6.4095"
    [10,] "Bihari Muslim" "6.4485"

    Mixed Mode
    HarappaOracle(c(52,18,2,7,17,0,0,1,2,0,0),mixedmode=T)

    [1,] "99.4% Bihari Brahmin + 0.6% bolivian" "0.5006"
    [2,] "0.6% maya + 99.4% Bihari Brahmin" "0.5017"
    [3,] "0.5% pima + 99.5% Bihari Brahmin" "0.5044"
    [4,] "99.5% Bihari Brahmin + 0.5% totonac" "0.5068"
    [5,] "0.5% karitiana + 99.5% Bihari Brahmin" "0.5073"
    [6,] "0.5% surui + 99.5% Bihari Brahmin" "0.5073"
    [7,] "0.8% ecuadorian + 99.2% Bihari Brahmin" "0.5192"
    [8,] "0.8% mexicans + 99.2% Bihari Brahmin" "0.532"
    [9,] "0.8% colombian + 99.2% Bihari Brahmin" "0.5806"
    [10,] "0.6% east-greenlanders + 99.4% Bihari Brahmin" "0.6203"

    HarappaOracle("Bihari Brahmin",mixedmode=T)
    [,1] [,2]
    [1,] "Bihari Brahmin" "0"
    [2,] "82.3% Rajasthani Brahmin + 17.7% ap-mala" "1.5283"
    [3,] "82.7% Rajasthani Brahmin + 17.3% Tamil Vishwakarma" "1.5463"
    [4,] "11% lithuanians + 89% Gujarati" "1.5744"
    [5,] "17.4% sakilli + 82.6% Rajasthani Brahmin" "1.612"
    [6,] "17.7% kamsali + 82.3% Rajasthani Brahmin" "1.6312"
    [7,] "83.2% Rajasthani Brahmin + 16.8% tn-dalit" "1.6388"
    [8,] "17.7% north-kannadi + 82.3% Rajasthani Brahmin" "1.6591"
    [9,] "81.6% Rajasthani Brahmin + 18.4% Chenchus" "1.6629"
    [10,] "81.4% Rajasthani Brahmin + 18.6% Chamar" "1.6762"

    • Hi Parasar, here is my guess... HRP0003 has more east Asian admixture when compared to "Bihari Brahmin" which is showing up as south-american.

      For the "Bihari brahmin", all results except for [4,] seem obvious. To explain [4,] we have seen in the Eurogenes project that "Brahmins" in general seem to have something in common with Lithuanian/Baltic populations. Hence the Gujarati+Lithuanian combo makes sense.

      • What seems strange is why HRP0003 and Bihari Brahmin would have different results considering that HRP0003 is the sole individual that comprises of the Bihari-Brahmin group composite?

  2. Thanks, Zack! Check out my oracle scores, too;

    [1,] "7.3% Saudis + 92.7% Kanjars" "1.0774"
    [2,] "6.9% Yemen-Jews + 93.1% Kanjars" "1.0993"
    [3,] "14.1% Balochi + 85.9% Meghawal" "1.1112"
    [4,] "7.4% Samaritians + 92.6% Kanjars" "1.1171"
    [5,] "22% Bene-Israel + 78% Kanjars" "1.1402"
    [6,] "8.3% Iraq-Jews + 91.7% Kanjars" "1.1491"
    [7,] "8.7% Iraqi Arab + 91.3% Kanjars" "1.1493"
    [8,] "7.1% Iranian-Jews + 92.9% Muslim" "1.1526"
    [9,] "8.5% Iranian-Jews + 91.5% Kanjars" "1.1539"
    [10,] "13.5% Brahui + 86.5% Meghawal" "1.1605"

    The appreciable presence of the SW Asian component in me (~10%) in Ref 3 K=11 is being literally interpreted by the Harappa Oracle, yes? Doesn't the program take into account it's even higher presence in some other South-Asian populations?

    • As you said, it looks like Harappa Oracle is trying to match up your SW Asian component by combining appropriate amounts from different populations. It is also trying to match up the total W. Eurasian components or ANI. Kanjars are about 53% ANI and Saudis are mostly W. Eurasian with some African admixture. 0.53*0.93+0.07 = 0.56 which is not too far from your ANI of ~0.6.

      However we cannot take the values of SW Asian and European from the Ref. 3, K=11 analysis literally as ancestry from SW Asia or from Rurope. More than a year ago, Zack found in the Ref. 1 analysis that the major W. Eurasian component was Baloch/Caucasian. Dienekes and Metspalu et al. have independently come to the same conclusion.

  3. As an aside, Zack, will you be releasing a DIY calculator for Ref 3 K=11. Many non-participants seem to be very interested in one.

  4. This is awesome Zack, I played with it for a couple of hours. The concentrations are really interesting.

    Cheers

  5. "I am using Dienekes' code with a couple of changes. One of them is using weighted distance based on Fst divergences between ancestral components."

    I am glad that you are using distance based on Fst divergences. This is what I proposed all the time, I just called it "adjusted distance" and I mentioned it to you how to do it here:
    http://www.harappadna.org/2012/02/admixture-ref3-k11-hrp0211-hrp0220/
    >>
    Zack February 24, 2012 at 10:13 pm

    Looks interesting, though I don't have Excel on my home machine, so I haven't been able to test it.

    Also, what's the difference between the normal and adjusted distance columns in the results?
    Reply

    Palisto February 25, 2012 at 4:38 pm

    I explained it here.
    http://www.forumbiodiversity.com/showpost.php?p=571226&postcount=5

    The adjustment is to address the relation of the different components to each other based on the Fst divergences you provided here:
    http://www.harappadna.org/2011/04/reference-3-admixture-k11/
    <<

  6. The upper class is more Republican « elmsprogressivemedia - pingback on March 26, 2012 at 3:21 pm
  7. Oracle observations for Kerala Nair

    HarappaOracle("Kerala Nair",k=20)
    [,1] [,2]
    [1,] "Kerala Nair" "0"
    [2,] "Brahmins_from_Tamil_Nadu" "1.4726"
    [3,] "Kerala Brahmin" "2.1735"
    [4,] "Iyer Brahmin" "2.3957"
    [5,] "meghawal" "2.9648"
    [6,] "Iyengar Brahmin" "3.0628"
    [7,] "Goan" "3.1174"
    [8,] "Maharashtrian" "3.2327"
    [9,] "Karnataka Brahmin" "3.3282"
    [10,] "Meena" "3.5734"
    [11,] "Kerala Christian" "3.6895"
    [12,] "singapore-indians" "3.8758"
    [13,] "Kshatriya" "4.0627"
    [14,] "Gujarati" "4.2323"
    [15,] "UP" "4.8068"
    [16,] "tn-brahmin" "5.0151"
    [17,] "Lambadi" "5.4237"
    [18,] "gujaratis-b" "5.5692"
    [19,] "Sourastrian" "5.6119"
    [20,] "Muslim" "5.8098"

    > HarappaOracle("Kerala Nair",mixedmode=T)
    [,1] [,2]
    [1,] "Kerala Nair" "0"
    [2,] "48.5% kashmiri-pandit + 51.5% Tamil_Nadu_Scheduled_Caste" "0.2834"
    [3,] "74.1% Kerala Christian + 25.9% UP Brahmin" "0.4427"
    [4,] "71.2% Andhra Pradesh + 28.8% Bhatia" "0.5272"
    [5,] "65.1% Kerala Christian + 34.9% Brahmins_from_Uttar_Pradesh" "0.558"
    [6,] "70.4% Andhra Pradesh + 29.6% Sindhi" "0.5715"
    [7,] "26.6% pathan + 73.4% Andhra Pradesh" "0.587"
    [8,] "31.3% pathan + 68.7% Tamil Vellalar" "0.6394"
    [9,] "41.4% kashmiri-pandit + 58.6% velama" "0.647"
    [10,] "2.2% onge + 97.8% Kerala Brahmin" "0.6953"

    However, my individual mixed mode results are interesting.

    HarappaOracle(c(56,24,0,9,7,2,0,1,0,0,0),mixedmode=T)
    [,1] [,2]
    [1,] "1% nganassans + 99% Kerala Nair" "1.3097"
    [2,] "1% koryaks + 99% Kerala Nair" "1.3814"
    [3,] "1% evenkis + 99% Kerala Nair" "1.3823"
    [4,] "1.1% dolgans + 98.9% Kerala Nair" "1.3963"
    [5,] "1% yakut + 99% Kerala Nair" "1.4097"
    [6,] "1.1% yukaghirs + 98.9% Kerala Nair" "1.4293"
    [7,] "1% kets + 99% Kerala Nair" "1.4367"
    [8,] "0.9% chukchis + 99.1% Kerala Nair" "1.4449"
    [9,] "0.6% papuan + 99.4% Kerala Nair" "1.4519"
    [10,] "1% selkups + 99% Kerala Nair" "1.4565"

  8. my results in mixedmode

    > HarappaOracle(c(20,6,1,31,40,0,0,1,1,0,0),mixedmode=T)
    [,1] [,2]
    [1,] "69.9% tuscans + 30.1% Bihari Brahmin" "1.0627"
    [2,] "71.8% tuscans + 28.2% Bengali Brahmin" "1.4946"
    [3,] "70.3% tuscans + 29.7% UP Brahmin" "1.5379"
    [4,] "33.9% bene-israel + 66.1% bulgarians" "1.5631"
    [5,] "72.6% tuscans + 27.4% ap-brahmin" "1.7342"
    [6,] "71.4% tuscans + 28.6% vaish" "1.7753"
    [7,] "72.9% tuscans + 27.1% Oriya" "1.7949"
    [8,] "71.7% tuscans + 28.3% Brahmins_from_Uttar_Pradesh" "1.8043"
    [9,] "67.8% tuscans + 32.2% Rajasthani Brahmin" "1.8924"
    [10,] "66.6% tuscans + 33.4% punjabi-arain" "1.8939"
    >

  9. My results in mixed mode

    HarappaOracle(c(60,30,3,5,1,0,0,0,0,0,0),mixedmode=T,k=20)
    [,1] [,2]
    [1,] "83.5% Sinhalese + 16.5% Tamil_Nadu_Scheduled_Caste" "0.6774"
    [2,] "89.7% Sinhalese + 10.3% Velamas" "0.7537"
    [3,] "9.1% velama + 90.9% Sinhalese" "0.8073"
    [4,] "92.5% Sinhalese + 7.5% Kurumba" "0.8355"
    [5,] "94.7% Sinhalese + 5.3% Piramalai_Kallars" "0.85"
    [6,] "94% Sinhalese + 6% Tamil Nadar" "0.8531"
    [7,] "5.4% Karnataka + 94.6% Sinhalese" "0.8625"
    [8,] "93.7% Sinhalese + 6.3% Tamil Vellalar" "0.8635"
    [9,] "0.2% yemen-jews + 99.8% Sinhalese" "0.867"
    [10,] "0.2% saudis + 99.8% Sinhalese" "0.8705"
    [11,] "2.2% vysya + 97.8% Sinhalese" "0.8727"
    [12,] "0.2% bedouin + 99.8% Sinhalese" "0.8746"
    [13,] "99.8% Sinhalese + 0.2% qatari" "0.8755"
    [14,] "0.2% samaritians + 99.8% Sinhalese" "0.8755"
    [15,] "0.2% iraq-jews + 99.8% Sinhalese" "0.8769"
    [16,] "0.2% iranian-jews + 99.8% Sinhalese" "0.8773"
    [17,] "98% Sinhalese + 2% Dusadh" "0.8779"
    [18,] "0.1% Iraqi Arab + 99.9% Sinhalese" "0.879"
    [19,] "0.1% palestinian + 99.9% Sinhalese" "0.8794"
    [20,] "0.1% druze + 99.9% Sinhalese" "0.8796

    Oracle observations for Sinhalese

    > HarappaOracle("Sinhalese",mixedmode=T,k=30)
    [,1] [,2]
    [1,] "Sinhalese" "0"
    [2,] "32% Gond + 68% Velamas" "1.2599"
    [3,] "22.9% Bengali + 77.1% Tamil_Nadu_Scheduled_Caste" "1.3262"
    [4,] "49.8% Bengali + 50.2% Piramalai_Kallars" "1.3283"
    [5,] "40.1% Kerala Christian + 59.9% Chenchus" "1.3329"
    [6,] "41.7% Bengali + 58.3% Tamil_Nadu_Scheduled_Caste" "1.4593"
    [7,] "28.4% Bengali + 71.6% Velamas" "1.5238"
    [8,] "9.4% juang + 90.6% Velamas" "1.5415"
    [9,] "25% satnami + 75% Velamas" "1.5487"
    [10,] "9.6% bonda + 90.4% Velamas" "1.5512"
    [11,] "53.4% Tamil Muslim + 46.6% Velamas" "1.5689"
    [12,] "15.4% Dhurwa + 84.6% Velamas" "1.5762"
    [13,] "17.8% sahariya + 82.2% Velamas" "1.5867"
    [14,] "58.8% Kerala Muslim + 41.2% Piramalai_Kallars" "1.6186"
    [15,] "45.6% Chenchus + 54.4% Lambadi" "1.6208"
    [16,] "14.9% Bhunjia + 85.1% Velamas" "1.6226"
    [17,] "11.4% kharia + 88.6% Velamas" "1.6338"
    [18,] "10.3% gadaba + 89.7% Velamas" "1.662"
    [19,] "11.3% savara + 88.7% Velamas" "1.6758"
    [20,] "2.4% dai + 97.6% Tamil Vellalar" "1.7066"

  10. I just found out there is a "Romany" Category. Which can be observed

    Observations
    HarappaOracle("Romany",mixedmode=T)
    [,1] [,2]
    [1,] "Romany" "0"
    [2,] "26.6% Punjabi Jatt + 73.4% spain-basc" "0.8078"
    [3,] "25.8% Kashmiri + 74.2% spain-basc" "1.0171"
    [4,] "26.6% punjabi-arain + 73.4% spain-basc" "1.0517"
    [5,] "26.3% Bhatia + 73.7% spain-basc" "1.0549"
    [6,] "24.5% Punjabi + 75.5% spain-basc" "1.0877"
    [7,] "72% basque + 28% Haryana Jatt" "1.112"
    [8,] "26% Sindhi + 74% spain-basc" "1.1708"
    [9,] "24.5% Punjabi Brahmin + 75.5% spain-basc" "1.1906"
    [10,] "74.6% basque + 25.4% Rajasthani Brahmin" "1.1922"

    -- I got Rajasthani Brahmin and Punjabi Arain as well in my results