A lot of the work I do involves extracting data from VCF files ("Variant Call Format"; see http://bit.ly/apUbi8). It's tab-delimited but not quite: some of the columns contains structured data rather than just a value, and the format of these columns might even be different for every single line.

An example line (with the header):

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1

1 12345 . A G 249.00 0 MQ=23.66;DB;DP=89;MQ0=26;LowMQ=0.2921,0.2921,89 GT:DP:GQ 1/1:89:99.00

The INFO field is actually a list of tag/value pairs (except when it's just a tag), and the meaning of the data in the SAMPLE1 column is explained in the FORMAT column. Not only can different INFO tags be present on different lines, but the FORMAT can change line-by-line.
Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Tags
Blog Archive
Links
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.