I believe...
Ryo Sakai reminded me a couple of weeks ago about Simon Sinek's excellent TED talk "Start With Why - How Great Leaders Inspire Action"; which inspired this post... Why do I do what I do?
The way data can be analysed has been automated more and more in the last few decades.
I'll do Angelina Jolie
"I'll do Angelina Jolie". Never thought I'd say that phrase while talking to well-known Belgian cartoonists, and actually be taken serious.
Backtrack about one year.
Available: Research position Biological Data Visualization and Visual Analytics
We could still use more applicants for this position, so bumping the open position...
Postdoc position available: visualization and genomic structural variation discovery
SymBioSys is a consortium of computational scientists and molecular biologists at the University of Leuven, Belgium focusing on how individual genomic variation leads to disease through cascading effects across biological networks (in specific types of constitutional disorders and cancers).
Quantified Health, and my frustration with genetics
Since the publication of the human genome sequence about a decade ago, the popular press has reported on many occasion about genes allegedly found for things ranging from breast size, intelligence, popularity and homosexuality to fidgeting.
2Clojure, visualization, and scripts
Bit of a technical post for my own reference, about visualization and scripting in clojure.
Clojure and visualization
Being interested in clojure, a tweet by Francesco Strozzi (@fstrozzi) caught my attention last week: "A D3 like #dataviz project for #clojure. Codename C2 and looks promising.
Biovis/Visweek recap
Finally time to write something about the biovis/visweek conference I attended about a week ago in Providence (RI)... And I must say: they'll see me again next year. (Hopefully @infosthetics will be able to join me then).
Humanizing Bioinformatics
I was invited last week to give a talk at this year's meeting of the Graduate School Structure and Function of Biological Macromolecules, Bioinformatics and Modeling (SFMBBM). It ended up being a day with great talks, by some bright PhD students and postdocs.
4Visualize This (by Nathan Yau) arrived...
Last Friday I received my long-anticipated copy of "Visualize This" by Nathan Yau. On its website it is described as a "practical guide on visualization and how to approach real-world data".
Visualizing the Tour de France
UPDATE: I encountered a blog post by Martin Theus describing a very similar approach for looking at this same data (see here).
Disclaimer 1: This is a (very!) quick hack. No effort was put in it whatsoever regarding aesthetics, interactivity, scaling (e.g. in the barcharts), ...
TenderNoise - visualizing noise levels
A couple of days ago I bumped into this tweet by Benjamin Wiederkehr (@datavis): "Article: TenderNoise http://datavis.ch/q9pIxq" It describes a visualization by Stamen Design and others displaying noise levels at different intersections in San Francisco.
Why did I move into data visualization?
Preamble: It's been very quiet on this blog since I left the Wellcome Trust Sanger Institute in the UK and took my position here at Leuven University in Belgium last October.
VizBi 2011 - looking back
Has been a while (again) since my last post. It seems that the requirements on my time are just a little bit different from during my previous position... But I'd like to share a little bit about the VizBi conference that I attended 2 weeks ago.
Open Research Computation - a new journal from BioMedCentral
As a colleague of mine said a couple of weeks ago: "if you don't publish it, it didn't happen". Scientific publications are the currency to advance a researcher's career.
VCF, tab-delimited files and bioclojure
A lot of the work I do involves extracting data from VCF files ("Variant Call Format"; see http://bit.ly/apUbi8). It's tab-delimited but not quite: some of the columns contains structured data rather than just a value, and the format of these columns might even be different for every single line.
Postdoc position - Genomic variation discovery and visualization
Just a short note...
Even though my position in Leuven only starts in October, I've already been involved in writing and defending a major grant.
Encounter with incanter - about clojure, incanter and bioinformatics
I have been a bit frustrated lately by the fact that for many of my analyses I have to write a ruby script to mangle my data first, then resort to R to add a statistic to each of the datapoints, go back to ruby to mangle the result, repeat, rinse, and finally make plots in R.
Threads in ruby: probably not how to use them
I should create an online labbook with code examples of how I do things. Keep going back to an example script I have to copy/paste the code for handling different threads in ruby.
1Tipping my toes in mongodb with ruby
Read the excellent post by Neil Saunders on using ruby and mongodb to archive his posts on FriendFeed, prompting me to finally write down my own experiences with mongodb. So here goes...
Let's have a look at the pilot SNP data from the 1000genomes project.
Trying out mapreduce - on the farm
Received an email this week from Sanger helpdesk that they installed a test hadoop system on the farm with 2 nodes. Thanks guys! First thing to do, obviously, was to repeat the streaming mapreduce exercise I did on my own machine (see my previous post).
Trying out mapreduce
Photo by niv available from Flickr
I have long been interested in trying out mapreduce in my data pipelines. The Wellcome Trust Sanger Institute has several huge compute farms that I normally use, but they don't support mapreduce jobs.
First test release of circular genome browser
Worked a couple of days on pARP, the circular genome browser, and I think it's ready to be tested out by others. Consider this an alpha release: expect a lot of issues. It's easy to create regions with a negative length, for example.
LocusTree - searching genomic loci
"Contigs should not know where they are." That's a phrase uttered by James Bonfield when presenting his work on gap5, the successor to gap4, a much-used assembly software suite.
2The good and bad of genome viewers
Back before the human genome was fully sequenced and NCBI, UCSC and Ensembl started working on visualization, it made a lot of sense to go for linear representations and use tracks for annotation. After all: chromosomes are linear.
4Who-o-o are you? Who who? Who who?
Image by Danny McL via Flickr
There’s been quite a lot of discussions going on lately about author identification: Raf Aerts’ correspondence piece in Nature (doi:10.1038/453979b), discussions on FriendFeed, ...
To find structural variation, look at read pairs: introducing pARP
Nextgen sequencing is making a huge impact on how research is done in the genomics field. One of the ways to discover structural variants in a genome for example is to create a clone library for an individual, sequence the ends of those clones and then map those ends to the reference genome.
3Visualize or summarize?
Image by Kaeru via Flickr
I've recently started using raw visualizations to get an idea of what data looks like rather than writing scripts to summarize. And what I found is that presenting data visually in a raw format might be more useful than condensing everything down into just a few numbers.
Data visualization
Today is "Data management, mining, curation and visualization" day at the Genome Informatics conference in Hinxton. It might be one of the more interesting ones for me, because that's what I do: manage, mine, curate and attempt to visualize. And I must say the last bit the most difficult.
1Using git to sync server with laptop
After investigating git for the bioruby project, I started using it on basically every project I run. And what do I use it for? Two things: keeping track of changes (duh) and syncing between server and laptop.
7Bioruby with git: how would that work?
Disclaimer: This blog post is the result of several iterations of writing/discussion/rewriting from Anthony Underwood, Michael Barton, Matt Wood and myself, with additional help from Paul Thornthwaite.
4Would you want to contribute to a small open-source project?
Just a quick plug to see if I can find people interested in helping me out in some of my projects.
In the last 2 years, I started four open source projects (well: the last one was today...), each of which scratches my own itch and does what it needs to do for me.
Keeping track of things: using a labbook for bioinformatics
It's been a while since my last post. Left my last job, was unemployed for a month (while still chairing a session at a conference), and just started my new position here at the Sanger Institute.
On every job you're able to pick up some new things that can help you out later.
Where did I get that data from?
Did you ever have data lying around that you couldn't figure out where you got it from?
You downloaded and imported data from an FTP site into your database ages ago and you actually want to use it now.
Testing small scripts
Seasoned programmers know this: testing should be an integral part of developing any script/program/software suite. Part and parcel is the unit test, where you test every little aspect of your program little by little.
Making Bio::Graphics extendable
One of the issues in a library like Bio::Graphics, is the plethora of glyph types that users will want. Here's a little showcase of what's provided by the library:
Features on a DNA sequences can be represented as filled boxes, open boxes, boxes with arrows, lines, triangles, ...
What makes code beautiful?
Saw this webcast a couple of weeks ago where Marcel Molina explains the notion of beautiful code. And I really recommend anyone writing code to have a look at it (totally irrespective of the fact he uses a ruby example...).
3Named arguments in ruby
One of the main disadvantages of using ruby that I bump into is the absence of named arguments (or keyword parameters). That's no problem for methods taking just two or three arguments, but it does get confusing when you have to be able to pass more than that.
7The state of bioruby (or: how can bioruby grow?)
A number of people asked me recently about the usability of ruby/bioruby and if it would be worthwhile for them to take the plunge and investigate bioruby more. So I thought writing up here would be a good idea...
3Using rake to manage your software project
Do you have some of those projects where you have to be sure that you jump the same loops every time you edit some code? Take a look at the bio-graphics code.
5Bio::Graphics and rails
As a follow up to my post on Bio::Graphics, I tried integrating this library in a rails application. After all, you'd get your data either from a file (like GFF) or a database. And let me tell you: it took me just 30 minutes or so to get a proof-of-concept running.
2Graphics, genomics and ruby
Having known and used the Generic Genome Browser (aka gbrowse, see here) for years now, it occured to me a while ago that it should be o so simple to create the same functionality with a much easier setup if we could use ruby instead of perl.
Gbrowse depends on bioperl's Bio::Graphics module.
ActiveRecord - all vs all relationships
Modeling genetics or genomics data presents its own challenges. One of the issues is that the actual definition of things change over time. A database system can only be based on the scientific knowledge at the time of conception.
6A ruby API to the Ensembl database
"Joy to the world, lalaa la laaaa." I can finally announce that I've released the ruby API to the Ensembl core database under the bioruby-annex umbrella. Go here for the release.
1ActiveRecord and mysql: show my databases
Working on a ruby API for the Ensembl databases, I bumped into the issue of having to connect to a database without knowing its name.
The ensembl database server hosts databases for each species. Every two months or so, there's a new release which means a new database for every single species.
How do you process literature?
A quick glance at the side of my desk reveals two stacks of manuscripts to read; each stack about 20cm high. Sounds familiar? There seems to be a major task in front of me to process all that.
First thing to do is to identify what caused those piles in the first place.