As a colleague of mine said a couple of weeks ago: "if you don't publish it, it didn't happen". Scientific publications are the currency to advance a researcher's career. Looking for a new job? You better make sure your publication list is littered with first or second author papers in good (read: high impact factor) journals. Hoping to have your tenure track lead to tenure? Idem. Publish or perish.

Meanwhile, many bioinformaticians spend huge amounts of time developing software to make genetic or genomic research possible; research that just wouldn't happen if it was not for their custom-written tools, scripts and pipelines. Unfortunately, you often need the find function of your webbrowser or PDF reader to be able to pinpoint the lone bioinformatician in the author list.

A lot of the work I do involves extracting data from VCF files ("Variant Call Format"; see http://bit.ly/apUbi8). It's tab-delimited but not quite: some of the columns contains structured data rather than just a value, and the format of these columns might even be different for every single line.

An example line (with the header):

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1

1 12345 .

Just a short note...

Even though my position in Leuven only starts in October, I've already been involved in writing and defending a major grant. We've set up a consortium in Leuven (SymBioSys 2) consisting of 6 PIs "focusing on how individual genomic variation leads to disease through cascading effects across biological networks". This should be a good stepping stone to get my own lab running.

I have been a bit frustrated lately by the fact that for many of my analyses I have to write a ruby script to mangle my data first, then resort to R to add a statistic to each of the datapoints, go back to ruby to mangle the result, repeat, rinse, and finally make plots in R. Of course as a bioinformatician you're used to that and if necessary you write wrapper/pipeline scripts to handle this all for you if you know that this won't be the only time you have to do the analysis.

I should create an online labbook with code examples of how I do things. Keep going back to an example script I have to copy/paste the code for handling different threads in ruby. But I'll put it here for the moment :-)

Suppose I have a file with several millions of lines containing information on SNPs. And suppose I have a database that already contains data for those SNP. And suppose I want to update the entries in the database with the data from the input file.
1

Read the excellent post by Neil Saunders on using ruby and mongodb to archive his posts on FriendFeed, prompting me to finally write down my own experiences with mongodb. So here goes...

Let's have a look at the pilot SNP data from the 1000genomes project. The data released in April 2009 contain lists of SNPs from a low-coverage sequencing effort in the CEU (European descent), YRI (African) and JPTCHB (Asian) populations.
Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Tags
Blog Archive
Links
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.