I have been misusing correlation in computing Ancestral South Indian percentages from PCA/ADMIXTURE and Reich et al population-level averages.
I have tried to make it clear that just looking at the correlation is not enough, that an admixture component is not similar to ASI just because it correlates well with Reich et al's ASI averages for the 18 Indian cline populations. Even when the correlation is higher than 0.99. To illustrate what I mean, let's look at the Ref4C admixture runs.
I calculated the mean for each admixture component from the K=2 to K=12 runs for the 18 Indian cline populations and then computed the correlation between that and the Reich et results. Let's take a look:
|3||C2 East Asian||0.9955347|
|6||C1 South Asian||0.9675099|
|7||C1 South Asian||0.993081|
|8||C1 South Asian||0.9932762|
|9||C1 South Asian||0.9914145|
|10||C1 South Asian||0.9918095|
|11||C1 South Asian||0.9919097|
|12||C1 South Asian||0.9918594|
Where do you see the highest correlation? At K=3 ancestral populations, the East Asian component is very highly correlated with ASI for the Indian cline populations. Does that mean that we could use that to compute ASI? No, not at all. While it is expected that at K=3, ASI would be a little closer to East Asian than to European, East Asian is not a good proxy for ASI at all since we cannot extrapolate to other individuals and populations.