I got the Stanford University data which has data for 660,918 SNPs from 1,043 samples. It is claimed that the forward strand is given but that turned out not to be true and I had to flip strands and make sure I didn't include any ambiguous A/T or C/G strands in my dataset.
I also excluded the Native American samples because we are not interested in them and they are very closely related either due to recent endogamy or ancient bottlenecks. (yeah I had the nerve to write that.)
Of the total of 876 samples, here are the numbers for our populations of interest:
|Total South Asians||190|
These samples have about 541,560 SNPs in common with 23andme v2.