Jan
7
Tipping my toes in mongodb with ruby
Read the excellent post by Neil Saunders on using ruby and mongodb to archive his posts on FriendFeed, prompting me to finally write down my own experiences with mongodb. So here goes...
Let's have a look at the pilot SNP data from the 1000genomes project. The data released in April 2009 contain lists of SNPs from a low-coverage sequencing effort in the CEU (European descent), YRI (African) and JPTCHB (Asian) populations. SNPs can be dowloaded from here; get the files called something.sites.2009_04.gz. The exercise that we'll be performing here, is to get an idea of how many SNPs are in common between those populations.
The input data
The input data contains chromosome, position, reference allele (based on reference sequence), alternative allele and allele frequency in that population.
Let's have a look at the pilot SNP data from the 1000genomes project. The data released in April 2009 contain lists of SNPs from a low-coverage sequencing effort in the CEU (European descent), YRI (African) and JPTCHB (Asian) populations. SNPs can be dowloaded from here; get the files called something.sites.2009_04.gz. The exercise that we'll be performing here, is to get an idea of how many SNPs are in common between those populations.
The input data
The input data contains chromosome, position, reference allele (based on reference sequence), alternative allele and allele frequency in that population.