Read the excellent post by Neil Saunders on using ruby and mongodb to archive his posts on FriendFeed, prompting me to finally write down my own experiences with mongodb. So here goes...

Let's have a look at the pilot SNP data from the 1000genomes project. The data released in April 2009 contain lists of SNPs from a low-coverage sequencing effort in the CEU (European descent), YRI (African) and JPTCHB (Asian) populations. SNPs can be dowloaded from here; get the files called something.sites.2009_04.gz. The exercise that we'll be performing here, is to get an idea of how many SNPs are in common between those populations.

The input data

The input data contains chromosome, position, reference allele (based on reference sequence), alternative allele and allele frequency in that population.
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Blog Archive
Dynamic Views theme. Powered by Blogger. Report Abuse.