Category Archives: Participation - Page 3

End of March Update

I have a total of 67 participants in the project right now who have sent me their raw data. This is not counting those who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.http://polvam.ru

The following groups are represented:

I need to post analyses of Tamils, Bengalis and Punjabis soon.

Another Update

I have a total of 51 participants in the project right now who have sent me their raw data. This is not counting three people who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.

The following groups are represented:

  • Punjab: 7
  • Iran: 7
  • Tamil: 6
  • Bengal: 5
  • Andhra Pradesh: 2
  • Bihar: 2
  • Karnataka: 2
  • Caribbean Indian: 2
  • Kashmir: 2
  • Uttar Pradesh: 2
  • Sri Lankan: 2
  • Kerala: 2
  • Iraqi Arab: 2
  • Anglo-Indian: 1
  • Roma: 1
  • Goa: 1
  • Rajasthan: 1
  • Baloch: 1
  • Unknown: 1
  • Egyptian/Iraqi Jew: 1
  • Maharashtra: 1

I haven't received data from any new participants for more than a week which is the longest lull since I started Harappa Ancestry Project. So go out there and get people to send me their 23andme raw data.

Also, does anyone know if there are a significant number of South Asians who have done FamilyTreeDNA's Family Finder test? Is there a good overlap of SNPs between their test and 23andme's?

We have enough Punjabis, Iranians, Tamil and Bengalis that they deserve separate analysis posts.

Project Update

I have a total of 42 participants in the project right now who have sent me their raw data. This is not counting two people who have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.

The following groups are represented:

  • Punjab: 7
  • Iran: 6
  • Tamil: 5
  • Andhra Pradesh: 2
  • Bengal: 2
  • Bihar: 2
  • Karnataka: 2
  • Caribbean Indian: 2
  • Kashmir: 2
  • Anglo-Indian: 1
  • Roma: 1
  • Goa: 1
  • Uttar Pradesh: 1
  • Sri Lankan: 1
  • Rajasthan: 1
  • Kerala: 1
  • Baloch: 1
  • Unknown: 1

The unknown is Manu Sporny who has put his genetic data in the public domain and I have drafted him into our project.

In addition, out of curiosity, I have accepted data from the following:

  • Iraqi Arab: 2
  • Egyptian/Iraqi Jew: 1

I know a bunch of you have done a lot to make this project known and gotten people to submit their data. But we really do need more participants of every ethnicity and geographic region in and around South Asia. So keep on!

I am working on K=12 admixture runs for the batches we have already done. In addition, the reference I dataset will be used for even higher values of K admixture components to see where the limit is.

Also, I am looking into doing chromosome by chromosome admixture (and other analysis). I have done some experimental runs and once I have pored over that data, I'll have something to report.

As we have seen, even with the removal of the San and Pygmy, the Africans take up 3 ancestral components and most South Asians (excepting me of course) do not have any African admixture. So I am working on a reference dataset without any Africans. I have my own take on how to do that which I'll share in the next few days.

In short, my home computer is running admixture, plink, eigensoft, etc. 24x7.

Latest on Participants

I have a total of 31 participants in the project right now who have sent me their raw data. The following groups are represented:

  • Punjab: 7
  • Tamil: 4
  • Iran: 4
  • Andhra Pradesh: 2
  • Bengal: 2
  • Bihar: 2
  • Karnataka: 2
  • Caribbean Indian: 2
  • Anglo-Indian: 1
  • Roma: 1
  • Kashmir: 1
  • Goa: 1
  • Uttar Pradesh: 1
  • Sri Lankan: 1

Keep them coming!

I am going to get some admixture analysis on the second batch (HRP0011 to HRP0020) done this week.

Participation Update

I have a total of 23 participants in the project right now who have sent me their raw data. The following groups are represented:

  • Punjab: 7
  • Tamil: 4
  • Iran: 3
  • Bengal: 2
  • Andhra Pradesh: 2
  • Bihar: 1
  • Anglo-Indian: 1
  • Roma: 1
  • Karnataka: 1
  • Kashmir: 1

There is still a lot of ethnicities and regions missing. Uttar Pradesh comes to mind as the biggest one.

Participants So Far

While I am analyzing the data, checking for errors and making sure the results I am getting are valid, here is some information about participants till now.

So far I have got 11 participants send me their raw data. Of these eleven, ten have some South Asian ancestry.

The regions/ethnicities they cover are:

  • Punjab
  • Bengal
  • Bihar
  • Tamil Nadu
  • Telegu
  • Anglo-Indian

Of these, Punjabis are the only ones I have multiple samples of. So I definitely need more samples of the other ethnicities. And there are lots of ethnicities/regions I haven't gotten any participants in.

It would be great for this project if we got a few participants from each state/province of India and Pakistan. So if you know someone who is from our target regions and has tested with 23andme, please spread the word.

If you tested with 23andme during their Christmas sale, I am hearing that results are going to start coming in starting today.

Introduction

I have become interested (some would say obsessed) with genetics recently. I wrote about getting my DNA test done and there's a lot more about my own results that I plan to bore you with.

One fun application of genetic testing is inferring ancestry: Which ancestral group are you descended from? Can we estimate the admixture of the different population groups you are descended from?

Most DNA testing companies provide information about ancestry and genetic genealogy has taken off. With several genome databases (HapMap, HGDP, etc) and software (like plink, admixture, Structure) publicly available, the days of the genome bloggers are here. And I am trying to be the latest one.

In starting this project, I have been inspired by the Dodecad Ancestry Project by Dienekes Pontikos and Eurogenes Ancestry Project by David Wesolowski. The catalyst for this project was my friend Razib who I bug whenever I need to talk genetics.

What is Harappa Ancestry Project?
It is a project to analyze (autosomal) genetic data of participants of South Asian origin for the purpose of providing detailed ancestry information. So the focus of the project is on South Asians: Indians, Pakistanis, Bangladeshis and Sri Lankans.

The project will collect 23andme raw genetic data from participants to better understand the ancestry relationships of different South Asian ethnicities.

I have named it after Harappa, an archaeological site of the Indus Valley Civilization in Punjab, Pakistan.

Participation
People of South Asian origin, or from neighboring countries, are eligible to participate. The list of countries of origin I am accepting are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Right now, I am only accepting raw data samples from people who have tested with 23andme.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Data Privacy
The raw genetic data and ancestry information that you send me will not be shared with anyone.

Your data will be used only for ancestry analysis. No analysis of physical or health/medical traits will be performed.

The individual ancestry analysis published on this blog will be done using an ID of the form HRPnnnn known to only you and me.

What do you get?
All results of ancestry analysis (individual and group) will be posted on this blog under the Harappa Ancestry Project category. This will include admixture analysis as well as clustering into population groups etc.

I suggest you read about Dienekes' analysis on South Asians for an idea about what to expect.

You can access all blog posts related to this project from the Harappa Ancestry Project link on the navigation menu on every page of my website. You can also subscribe to the project feed.