Comments: Signal vs Noise

Recently there has been a lot of noise in the comments here with very little real information. That is a waste of time and effort for everyone.

I would appreciate if all of you thought about any comments you plan to make. The stronger your belief in a proposition, the more you should hesitate before posting it.

One thing you should keep in mind is that I know more than you do. I do not mean that as a boast but as a fact. Of course, I do not know the esoterica of South Asian caste divisions, religious rituals and cultural practices. It can even be said that my knowledge of Indian history is inferior to that of European, American and Near Eastern history. However, when participants send me their data, they are generous with their personal information. I have a lot more information about their ethnic backgrounds than is public, due to privacy concerns. Another factor is that a lot of the analyses I run do not end up here due to various reasons. But I use them in creating a complete picture. These two things give me an unfair advantage over you guys.

Generally, I have kept a very light hand on the comment section. This might have to do with this being my hobby. Thus I do not have time to control the conversation and reduce the noise. And I also feel that I have much to learn about history and genetics from all comers.

However, recently I feel that I need to run a much tighter ship so discussions are useful and on-topic and not the hobby horses of a few crazed people. I might have to take a page from my friend Razib Khan's comment policy and be very strict about deleting, warning and banning.


  1. Hi Zack,

    The "Gujaratis HarappaWorld Admixture" comments section was filled with a lot of useless speculation expressed as fact, and a lot of petty disagreement. A few specific commentators who always express their opinions (I won't mention names, they are very obvious, and include individuals who did not comment much in regards to that specific post) have very idiosyncratic axes to grind, and highly unusual obsessions. You are absolutely justified in considering more aggressive moderating.

  2. This is beyond your scope, but, can you summarize where you are after 2.5 years and where you wish to go? At this point, I am not clear what more can be gained by admixture runs, but you are the better judge.

    • the larger N you can get, the better you can gain a sense of variation within groups (or lack thereof). for example, the large N that zack has for tamil brahmins is without parallel in the scientific literature, and confirms their homogeneity. it is form zack's results that i am 100% (ok, 99%) convinced of the singular origin of south indian brahmins (or at least those in dravidian speaking states).

    • A summary of what we can glean from the data is of course warranted. I will try to write that soon.

      As for the admixture results, they are easy to compute and give us a ready comparison of individuals and groups. Larger sample sizes make it easier to be confident in the results as well.

      • Of course, I am not staying stop all admixture calculations, and do not expand your scope. I have found that if you write a brief summary of what has been achieved till date, and a goal statement of what you wish to achieve in the next 1 year, it helps (at least in my research).

        I am trying to slide in some research topics for the next one year for you (go ahead say who are you to advocate research topics without paying):

        1. Age and origin of Asian admixture in Bengalis
        2, Age and origin of jat admixture or invasion(!)
        3. Age of admixture of Northwest European into a South Indian/Baloch admixed population.

  3. the problem is caste and ethnic hatred show up in these discussions because technically genetics tells you who you are. Also, really dont understand why there is a dislike of (native) south asian genes, Indians for some reason always have a desire to be more west eurasian, so when I said that that muslims were more west eurasian then hindus looking at the spreadsheet, people got pissed, really just dont understand why would you hate your own genes and back ground. This is also a reality, this is why we have things like fairskin creams in India or Pakistan, because there is a dislike of who you are

    Zack, I apologize if I have caused any problems through my comments, I am going to be more careful from now on. you are doing nice work on genetics, please continue doing it

  4. I also hope we get more information on these components, I am very much interested in learning about components like Baloch, Caucasian, South Indian, Northern Euro etc... I would like to know what are the origins of these components and when did they get to south asia? by what populations? I wonder if there is a way to trace these things

  5. Hi Imran,

    The components don't really have an origin. You just can't take them that literally. If you add or subtract populations, use less or more SNPs, and play around with the parameters, the components will change in such a manner that the percentages won't be comparable. Also, certain populations that have been subjected to intense genetic drift/inbreeding become their own unique modal component at certain Ks, like the Kalash. Yet at K=16 for HarappaWorld, they are virtually identical to the HGDP Pathans. So a lot is contingent on the choice of Ks. In a sense, the components are simply how an algorithm dissects the data because you want it to do so. There is nothing “real” about it (except perhaps in the most general and broad of senses). In addition to that, you have to remember a few technical issues. For example, ADMIXTURE can't utilize Linkage-Disequilibrium. Then you have interesting stuff happening with ascertainment bias. Also, other methods can give very different results. For example, going by ADMIXTURE, the "West Eurasian" ancestry in East Africans is clearly a contribution from neighboring populations in the Arabian Peninsula. But it seems things are more complex than that, and of course this (now, I am not saying ADMIXTURE is wrong, I am just saying that this should make one more cautious). In short, you should not construe these components as representing actual populations that mixed at some real point (or points) in space and time. Instead, a more profitable and honest way of looking at these components involves recognizing that they are statistical constructs that help us draw out tentative (and I would put a lot of stress on "tentative") inferences about history. As Zack mentioned, this kind of analysis is quick, and it gives you a very good understanding of where individuals stand in comparison to populations. And it can be very helpful in differentiating between isolation-by-distance, and specific historical processes that can't really happen again. I mean, compare the Pashtuns and Jats. The Pashtuns have a much higher "Caucasian" component than their neighbors to the east. This is explicable by simple isolation-by-distance. As a people who occupy the only robust geographic link between the Iranian and Indo-Aryan cultural spheres, this is expected. But the Jats clearly involve something different. The more eastern of the Jats have more of the "Northeastern European" component than Punjabi Jats, and even Punjabi Jats have very substantial amounts of that component. The Jats are probably the most "Northeast European" of all South Asians. If Jats were a high caste group, one could probably invoke "Indo-Aryan" migration. It is because of ADMIXTURE that we know that isolation-by-distance does not apply very well to the sub-continent (although you can see that on PCA plots as well, and FST distances tell the same story), and that high caste groups tend to have more of the "Northeastern European" component than their co-ethnics who are lower on the caste scale (perhaps co-ethnics is not the best term). But the Jats are very far from being high caste. They are Shudras. We know from early Arab and Indian sources that the Jats were very marginal, and suffered a lot of persecution at the hands of higher castes (how this changed, and how Jats became powerful, I don't know). Thus, based on ADMIXTURE, the old British ethnologists look much more credible, because they often hypothesized an exogenous origin for the Jats based on physical appearance, and distinctive socio-cultural traits. So, I am not claiming ADMIXTURE does not tell us anything of worth or substance. I am just saying one can’t be so literal, and that one has to be much more cautious.

  6. ^ I think it is too early to say anything regarding Haryana Jats, since there are only 5 samples, if we had more, than we could at least say something about them. Physical appearance wise, Haryana Jats are rather very indid looking, pretty similar to other north indians from Delhi or UP, at least with punjabi sikh jatts, we can say their appearance is a bit different then other north indians, perhaps because of their higher Caucasian component

    As for Jats as a whole, I am not sure, they might have origins outside, but still this has to be proven, I think Jats actually come from the Pakistani side and there were lots of Jatts in places in western punjab, west of the indus river and even KPK province, so this maybe the ultimate reason why they are showing lower south asian and higher other non south asian components.

    As a whole, looking at these samples, one thing is for sure, that the north east euro component is found in significant levels all across northern south asia and central asia, basically stretching from Tajikistan to the Bengal region. I really do wonder if this is a indo aryan movement effect? or something else older?

  7. The Haryana jatt samples (components) rather almost completely resemble each other, so that's why there is a chance these samples might be from the same family or extended family, thats why it is really important to get more samples from that region to make conclusions