Taking Suggestions

What would you want from this project? What sort of analyses would you like me to do?veroxybd.com

I know several of you want regional admixture/PCA analyses and those are coming starting next week.

In addition to that, is there something specific you would like to be investigated?

For example, is there some specific supervised admixture you would like me to run? A specific PCA/MDS analysis?

Or do want me to try to synthesize all the results we have gotten into some sort of coherent theory instead of throwing out the numbers like I have been doing?


  1. I don't know if this is possible, but something more at a city level of granularity would be interesting.

    (Note, I'm aware that if people don't "breed locally" then this is probably nonsensical.)

    • City level analysis would require careful DNA collection in my opinion. I am relying on expats here, so it would be hard.

  2. "Or do want me to try to synthesize all the results we have gotten into some sort of coherent theory instead of throwing out the numbers like I have been doing?"

    That'll be great, considering the excellent and insightful discussion it tends to fuel, like on Brown Pundits. Regional analyses would also be great but I wonder whether a specific state based one or a pan-regional one (best way I can describe it), like for example a South Indian or North-West South Asia specific analysis with the appropriate reference sample individuals would be more fruitful since we can do a comparison of, f.ex:- Brahmins across several localities and how they compare with the non-Brahmins of their own state and the other states. Do you plan on running that reference data-set you acquired a while ago (Pan Asian data-set from the paper "Mapping Human Genetic Diversity in Asia", or do the SNPs tested correlate poorly with that of 23andMe and FTDNA?

    • The Pan-Asian dataset has only about 14,000 SNPs in common with 23andme and my other sources. So admixture analysis is going to be very noisy. I do want to see what PCA would show.

  3. i don't think it is optimal that you spend TOO much time on analysis, just the basics. you prolly have some intuitions from running ADMIXTURE on this data set we don't know. you're better at knocking down speculation i would think in terms of value add 🙂

  4. Zack...is it possible for you to understake an admixture analysis comparing us to turkic populations?

    • undertake*

      • Azad, you Turkophile, you!

        On a more serious note, wouldn't a South Asia vs Central Asian comparison simply result in "wrap-around" affinities since both populations have a West-Eurasian/non-West Eurasian cline?

        • Hahaha, studying about turkism is very interesting, I can't help it!!

          I see what you mean though; central asian admixture itself is very mixed due to cultural melting pot that Central Asia was.
          But what if we're compared to Siberians, Mongolians etc instead? Zack, what do you think?

  5. I would be interested in testing the hypothesis that some of the tribes in the north west descend at least in part from eastern scythians(sakas) . Some of these include pathans , jatts , rajputs and more. This might of course only be visible at higher K's.

    • how? i.e., what is your reference pop for sakas? ossetians? perhaps tajiks, who might be similar?

    • I was thinking about just this yesterday, incidentally and I was wondering whether the elevated European scores among the few Jatt participants we have in the project might be a by-product of mixing with the Sakas? Anyone who has been on Jatt-related portals and websites would have come across some articles wherein certain Jatt gots claim descent from Saka invaders. Considering the slightly more elevated European scores the three Jatts of HAP have, perhaps the assertion that some Jatts gots have Scythian progenitors should not immediately be written off as a claiming-foreign-origins exercise? Yes, this would definitely be worth investigating.

      In addition to that, some speculate that Y-DNA G2a3b* and G2a3b1, which is modal as far as South Asia is concerned among the Tamil Brahmins and Gujarati Brahmins at a frequency of 13% and 10% for the Iyengars and Iyers; and 10.9% for the Gujarati Brahmins respectively may well be an artifact of the Indo-Scythian empires, perhaps via a connection to the priests of those empires. Perhaps it'd be worth a shot investigating the aforementioned groups in a similar manner? The author of this page, a G2a3b1 Y-DNA individual himself, has some speculations (and only speculations, mind you, this shouldn't be the final word on anything) with regards to the Indo-Scythians-

      -"Another possibility is that [my] ancestors stayed in the Gujarat area following the decline of the Saka and Indo-Parthian empires, perhaps as priests serving Somnath temple, dedicated to god Shiva. Some priests of the Saka kings (called Magas, a term that denotes Zoroastrian priests) converted to Brahminism after coming to India (there is documented evidence of conversion of 18 priests). These converts became scholars of Sanskrit the Indo-Scythian-rulers were the first in India to introduce Sanskrit as the official language for State-related communication (before second century CE, there is no evidence of use of Sanskrit in India for official State business). When Somnath temple was first destroyed in 725 CE by the Arab governor of Sind, or when the re-built temple was destroyed by Mohammed Ghazni in 1024, many of the priests fled Gujarat, some to Tamilnadu, at the invitation of the Chola kings.� There is a historical record of migration of priests from Somnath temple to Tamilnadu during this period. I believe my ancestor is not very likely to have come South via this migration. A key reason is that the proportion of G2a3b1 among Iyers (worshippers of Shiva) and Iyengars (worshippers of Vishnu) is roughly the same (see below in the graph). Had my ancestor come after the raids on Somnath temple, there would really be no reason for him to convert to Vaishnavism from Shaivism after migrating South, when Shaivism was already flourishing in the South.� It is more likely that my ancestor came with the earlier migrations of the remnants of the Indo-Scythian empire, and was either a worshipper of Vishnu, or a follower of Buddha. Some of these migrants converted to Shaivism (as evidenced by substantial presence of G2a3b1 among Iyers), but many would have chosen to remain worshippers of Vishnu."

      More on this site - Source

    • The northern Sakas were in Yutian/Khotan, the central between Mathura and Gandhara, and southern in Malwa and Gujarat. First we can't say who were the precursor Saka. Assuming that the Khotanese were the precursor Saka who (Moga/Moasa/Maues) entered India via the Karakorum pass, you would expect their descendants to show an east Asian signature.

    • Four Jats tested - all are L657+

      U2321 Amar Sandhu, Jalandhar, Punjab, India L657+
      U2810 Tharn Bajwa, Pakistan, L657+
      N22414 Luddan Singh Ranu, 1800s, Manki, Punjab L657+
      163483 Gurbax Singh Sidhu, born c. 1905 in Dod, Punjab L657+

      No evidence as yet on the the presence of L657 outside the Indian Subcontinent and Arabia (except one Babasan Kazakh line).

  6. Here are my suggestions.

    (1) Run a K=2 ADMIXTURE analysis with the same populations as Reich et al. Note that they did not include Balochi and Makrani or the Munda and Tibeto-Burman tribes because these populations did not fall along the "Indian Cline". That component which is highest in the Pathans and Sindhis can be identified as ANI and that which is highest in the Mala and Madiga as ASI. It will be interesting to see if it is possible to reproduce the Reich et al. results using ADMIXTURE.

    (2) Run a K=4 ADMIXTURE with all the South Asian populations including Balochis, Makranis and Munda tribals with some African and East Asian populations to take out the effect of African and East Asian admixture. Perhaps this will allow a better separation of ANI, ASI and East Asian.

    • ASI was already inferred here, thanks to the presence of the Onge component, just so you know.

      Outside of Harappa, Dienekes Pontikos' Dodecad ancestry project has attempted to infer ANI-ASI for the South Asian participants here and here. Of course the number of subcontinental participants here far exceeds that of Dodecad (which is but natural).

  7. What about RHH Mapper then? Since we all mixed that could shed some light into.

  8. There are a number of things you can do that would make seeing patterns easier:

    1. Categorize your lists and charts by ethnicity: all the Tamils, Punjabis etc lumped separately; and then subcategorize by caste.

    2. Show results by proportion of components.

    3. Put the two largest components on either end.

    • You can do part of #1 by sorting by the ethnicity column in the table that goes along with the participants' admixture barchart.

      You can do #2 on the chart by clicking on the legend. In the Google spreadsheet, switch to List View and then you can sort by column.

  9. Running the new K=11 against the old reference I and reference II populations and consequently inferring ASI would also be nice, especially for the Xing et al data-set and other South Asian-specific data sets but I suppose that would take a lot of time, unless I am mistaken. There is also a certain amount of overlap between the populations across the different references, yes? But I suppose those populations, such as the Tamils, Telugu (A.P) and Punjabi Arain individuals will be run at K=11 in case of a prospective regional-analysis.

    • All Reference I populations are included in Reference 3. Xing et al is the only extra dataset in Reference II.

      Since Xing et al has a decent overlap with Reich et al, I plan to do some PCA and ADMIXTURE runs of Xing and Reich (with HapMap and whatever else keeps the number of SNPs above 100k included).

  10. Perhaps a replication of this experiment with emphasis on removing populations with high ASI to see how admixture proportions change.


    The latest reference set at k = 7 seem ideal base for this 🙂

    • I am not sure I get the point of the experiment you are proposing.

      • Just wondering about the way admixture works. If you remove high ASI populations from your set will the "South Asian component" of the remaining samples increase from before? The reason being is that I suspect that if we had pure ASI as samples (hypothetical)in terms of frequency South Asian = ASI for each sample.

  11. I would be interested in how the Pan Asian data-set Malaysian Negrito sample relates to the Onge. Can the Malaysian Negrito be substituted for Onge as a proxy for the ASI component of Indian populations? Perhaps the Pan Asian has insufficient populations for this purpose? I don't know the Indian populations very well.


    • The Pan-Asian dataset has only 14,000 SNPs in common with my other datasets, which is why I can't include them in my regular analyses. But I do plan to do some analyses on Pan-Asian with Reich et al or HGDP etc included.

  12. Ref4c is a favorite reference set so far. Fst divergences between components with a MDS plot + Fst dendrogram would prove interesting for k >= 9

  13. a ethnicity run would be great, i would like to know which indian ethnicity i am. so the theory would be great. thanks

    • Assigning ethnicity to mixed individuals is a little hard.

      In your case, the one measure is your ASI percentage. Since your Onge component was about 6% and you are half Roma, let's double it to 12.7%. Using our regression estimate, that means about 23% Ancestral South Indian, which is about the same or lower than people from Northwestern India or Pakistan.

      Now does that mean that your Roma ancestors were from Pakistan? Not necessarily. It is possible (even likely) that your Roma ancestors picked up genes on the way from South Asia to Europe and during their stay in Europe, thus reducing the Indianness of their genes.

  14. also a human tree would be cool where indians/southasian ethnic groups cluster within the human tree....and what ethnicities are caucasian and what are australoid

    • A classification tree based on closeness in PCA or admixture results is in the works.

      Do note that it is not a phylogeny. Also I have no idea how to divide people up into Caucasoid and australoid.

    • Im 60% roma gypsy, since my dad is a quarter gypsy too. So more than half. Can you somehow exclude all non-indian genes and just look at my southasian where it is from? i dont know if that is possible thanks:)

  15. a run including the gypsies of course

  16. there are two gypsies which dienekes detected in reich at al i think...among the romanian samples...can you somehow use them? thanks for answer...than we would be 3 gypsies

  17. behar at all, sorry