Project Update

Posted by Zack on February 20, 2011

I have a total of 42 participants in the project right now who have sent me their raw data. This is not counting two people who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.

The following groups are represented:

Punjab: 7
Iran: 6
Tamil: 5
Andhra Pradesh: 2
Bengal: 2
Bihar: 2
Karnataka: 2
Caribbean Indian: 2
Kashmir: 2
Anglo-Indian: 1
Roma: 1
Goa: 1
Uttar Pradesh: 1
Sri Lankan: 1
Rajasthan: 1
Kerala: 1
Baloch: 1
Unknown: 1

The unknown is Manu Sporny who has put his genetic data in the public domain and I have drafted him into our project.

In addition, out of curiosity, I have accepted data from the following:

Iraqi Arab: 2
Egyptian/Iraqi Jew: 1

I know a bunch of you have done a lot to make this project known and gotten people to submit their data. But we really do need more participants of every ethnicity and geographic region in and around South Asia. So keep on!

I am working on K=12 admixture runs for the batches we have already done. In addition, the reference I dataset will be used for even higher values of K admixture components to see where the limit is.

Also, I am looking into doing chromosome by chromosome admixture (and other analysis). I have done some experimental runs and once I have pored over that data, I'll have something to report.

As we have seen, even with the removal of the San and Pygmy, the Africans take up 3 ancestral components and most South Asians (excepting me of course) do not have any African admixture. So I am working on a reference dataset without any Africans. I have my own take on how to do that which I'll share in the next few days.

In short, my home computer is running admixture, plink, eigensoft, etc. 24x7.

Participation23andme, andhra, angloindian, arab, baloch, bengal, bihar, caribbean, egypt, goa, iran, iraq, jewish, karnataka, kashmir, kerala, punjab, rajasthan, roma, sri lanka, tamil, uttar-pradesh

← Admixture K=10-12, HRP0001 to HRP0010

Reference I: Eurasian Subsets →

16 Comments.

RK February 21, 2011 at 12:30 am

Sporny's half Sri Lankan, half white American of German and Polish ancestry.
- Zack February 21, 2011 at 11:05 am
  
  Thanks. He does show up as about half South Asian in admixture.
Personal genome in the public domain | Gene Expression | Discover Magazine - pingback on February 21, 2011 at 4:15 am
Personal genome in the public domain | Biology News by Biologged - pingback on February 21, 2011 at 8:32 am
Tanmoy Bhattacharya March 5, 2011 at 12:49 am

Hi Zack,

I am a Bengali Brahmin whose grandparents are all from Barisal district of Bangladesh. I have been trying to send you my 23andme v3 data for a while, but am getting no response. Could you let me know whether you received it, and will process it at some point; or whether both my emails landed up in the bitbucket?

Thanks
Tanmoy
- Zack March 5, 2011 at 8:23 am
  
  I am really sorry. Both of your emails ended up in my spam folder.
  
  I downloaded your data and will include it in batch 6.
Vasishta March 5, 2011 at 7:01 am

We have more Punjabis and Iranians than UPites, Biharis and Marathis!
- Simranjits March 5, 2011 at 9:17 am
  
  Well technically iran is a country. So it's not a fair comparison. Currently the makeup is representative of the actual indian diaspora to some degree.
  - RK March 5, 2011 at 11:07 am
    
    According to the American Community Survey (from the US Census), 26.3% of Indian Americans speak Hindi at home, 14.1% speak Gujarati, 10.1% speak English, and 10.0% speak Punjabi. (For comparison, 3.4% speak Marathi.) A lot of those Hindi speakers are probably Gujarati or Punjabi, but a fair number are probably from the cow belt. The Indo-Canadian population is predominantly Punjabi- and Tamil-speaking. But when you add in the diaspora population from other countries, the proportion of UPers and Biharis probably goes up.
    
    So I think people from the Indo-Gangetic plain are underrepresented -- even in terms of their diaspora populations -- but Gujaratis are really underrepresented among the participants (as opposed to the HapMap reference samples). Punjabis and South Indians (South Indian Brahmins in particular) are probably overrepresented.
    - Zack March 5, 2011 at 12:03 pm
      
      What about Tamil in ACS?
      - RK March 5, 2011 at 2:35 pm
        
        6.7% speak Tamil at home. (Compared to 9.7% who speak Telugu, 6.1% Malayalam, and 1.7% Kannada.)
        
        Note that these figures reflect the proportion of Indian-born respondents who speak the language at home, so it doesn't include Pakistani-born Punjabi speakers, Singaporean Tamils, American-born Telugu speakers, etc.
  - RK March 5, 2011 at 11:13 am
    
    Oh, the British Asian population is also largely Punjabi and Gujarati. But I still think people from U.P. and Bihar are underrepresented.
  - Zack March 5, 2011 at 11:37 am
    
    Iran has a population of 77 million while the two Punjabs together are 105 million.
Parasar March 5, 2011 at 11:37 am

Bengal (W. + Bangladesh) has 270 million people - another region/linguistic group underrepresented.
- Zack March 5, 2011 at 12:06 pm
  
  Bengalis are still underrepresented but now we have more than Razib's family.
  - Parasar March 5, 2011 at 1:37 pm
    
    Did see one of them above - Tanmoy. There is some excellent material on his web-site. I remember communicating with him on his match to the Andronovo ancient DNA's STR.

Trackbacks and Pingbacks:

Personal genome in the public domain | Gene Expression | Discover Magazine - Pingback on 2011/02/21/ 04:15
Personal genome in the public domain | Biology News by Biologged - Pingback on 2011/02/21/ 08:32

Harappa Ancestry Project

Genetics and South Asia

Project Update

Related

16 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Project Update

Share this:

Related

16 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll