Thursday, 20 September 2007

Graphics, genomics and ruby


Having known and used the Generic Genome Browser (aka gbrowse, see here) for years now, it occured to me a while ago that it should be o so simple to create the same functionality with a much easier setup if we could use ruby instead of perl.

Gbrowse depends on bioperl's Bio::Graphics module. Although gbrowse has been instrumental for many people's research, it does take a bit of work to get it installed. Apart from bioperl, it depends on Apache for showing the results in a browser. Compare that to any Rails application, where you basically just need ruby and a "gem install rails". I've created rails applications in the past that contain exactly the kind of data that would typically be visualized by something like gbrowse. Takes no time at all to set up and you can even get away by virtually writing no code. And no Apache to be installed, or configuration files that you can't access because you're not root.

Such a rails application makes it possible to browse, edit and delete the data. The problem comes with the visualization bit. There's no bioruby graphics library (yet?) that automatically parses features on a reference and creates a nice picture of where your genes are on that chromosome. Of course, the genes should be clickable so you can link through to NCBI or Ensembl.

I've spend some time in the last year creating such a Bio::Graphics thing for ruby. I wanted it to behave the same as the one from bioperl: there one panel that has one or more tracks, and each track has features on it. Even though it was quite easy to create a proof-of-concept library, the most difficult part was actually finding the right backend.

What should I use to create the pictures themselves? As I'd worked with SVG before, that seemed the right way to go. Downloaded a library from http://raa.ruby-lang.org/project/ruby-svg/ and got a prototype running quite easily. Problem: I needed an SVG viewer or firefox to actually view the picture, and zooming in/out screwed up all text. So after weeks of digging around, I've found rcairo, a ruby-binding to Cairo. Migrating to this backend was easy peasy and the pictures look really nice (see at the top). Unfortunately, it's impossible to create clickable glyphs using Cairo itself, but that can be easily worked around by creating a html file with the map. That's exactly what gbrowse does as well, isn't it?

The picture at the top has been created using the following simple script:


g = BioExt::Graphics::Panel.new(800, 1200, true, 1, 610)

track1 = g.add_track('generic')
track2 = g.add_track('directed',[0,1,0],'directed_generic')
track3 = g.add_track('triangle',[0.5, 0.5, 0.5],'triangle')
track4 = g.add_track('spliced',[1,0,0],'spliced')
track5 = g.add_track('directed_spliced',[1,0,1],'directed_spliced')

track1.add_feature('bla1','250..375', 'http://www.newsforge.com')
track1.add_feature('bla2','54..124', 'http://www.thearkdb.org')
track1.add_feature('bla3','100..449', 'http://www.google.com')

track2.add_feature('bla4','50..60', 'http://www.google.com')
track2.add_feature('bla5','complement(80..120)', 'http://www.sourceforge.net')

track3.add_feature('piep','56')
track3.add_feature('bla','103', 'http://digg.com')

track4.add_feature('gene1','join(34..52,109..183)','http://news.bbc.co.uk')
track4.add_feature('gene2','complement(join(170..231,264..299,350..360,409..445))')
track4.add_feature('gene3','join(134..152,209..283)')

track5.add_feature('gene1','join(34..52,109..183)', 'http://www.vrtnieuws.net')
track5.add_feature('gene2','complement(join(170..231,264..299,350..360,409..445))','http://www.roslin.ac.uk')
track5.add_feature('gene3','join(134..152,209..283)')

g.draw('my_panel.png')



What happens here?
Line 1: Create a new panel for a sequence of 800 bp, with the picture being 1200 points wide. Make all glyphs clickable if a URL is defined (the true), and zoom into the region from 1 to 610 bp.
Lines 3-6: Create different tracks, each with a name, a colour (in RGB at the moment) and a type.
Lines 8-24: Add features to those tracks, each with a name, a locus and an optional URL to link out to external websites. Notice how it handles spliced features and features on the reverse strand?
Line 26: Create the PNG (and in this case: also HTML) file.

Here's a nicer way to produce the same type of output:

#Initialize graphic for a nucleotide sequence of 600 bp
my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)

#Create and configure tracks
track_SNP = my_panel.add_track('SNP')
track_gene = my_panel.add_track('gene')
track_transcript = my_panel.add_track('transcript')

track_SNP.feature_colour = [1,0,0]
track_SNP.feature_glyph = 'triangle'
track_gene.feature_glyph = 'directed_spliced'
track_transcript.feature_glyph = 'spliced'
track_transcript.feature_colour = [0,0.5,0]

# Add data to tracks
DATA.each do |line|
line.chomp!
ref, type, name, location, link = line.split(/\s+/)
if link == ''
link = nil
end
if type == 'SNP'
track_SNP.add_feature(name, location, link)
elsif type == 'gene'
track_gene.add_feature(name, location, link)
elsif type == 'transcript'
track_transcript.add_feature(name, location, link)
end
end

# And draw
my_panel.draw('my_panel.png')

__END__
chr1 gene CYP2D6 complement(80..120)
chr1 gene ALDH 100..449
chr1 SNP rs1234 107
chr1 gene bla complement(400..430)
chr1 SNP rs9876 44
chr1 gene some_gene complement(join(170..231,264..299,350..360,409..445))
chr1 transcript transcript1 join(250..300,390..425)
chr1 transcript transcript2 253..330
chr1 transcript transcript3 266..344
chr1 transcript transcript4 complement(join(410..430,239..286,129..151))


If someone would actually be interested in getting the library behind this, just let me know. It should be really easy to incorporate this in a rails application where the data are actually stored in a database.

I wonder what if any role _why's Shoes thing would/could play...

UPDATE: This library has now been improved a bit and is hosted on rubyforge. You can find a tutorial and the whole API documentation at http://bio-graphics.rubyforge.org. You can find instructions on how to install and use it over there.

UPDATE TWO: Forget the previous update. I have moved the bio-graphics code to github. See http://github.com/jandot/bio-graphics. That should make it much easier to fork the code and get more input from other developers.

7 comments:

george said...

thanks a lot! i will give this a spin. This is a big step in the bioruby graphics world.

Jan Aerts said...

Thanks George. Hope you'll find it easy to use.

Francesco.Strozzi said...

This is great! First of all you have chosen to follow the same organization of BioPerl Bio::Graphics module that is a well known graphics library for the bioinformatics community. Second, the graphs look amazing, compared to the BioPerl ones, generated with the GD Perl library. A great job! I think that I will start from here to use Ruby instead of BioPerl for these kind of things!

Cheers

hien said...

Thanks, very good news! I started working on something similar with rmagick, but this looks like a real time-saver. One step closer to the mappy/googlemaps of genome viewers...

Rob said...

Fantastic work, this has saved me a bunch of time.
Is it possible to change the colour of individual features?
I've just sketched up this:
http://www.flickr.com/photos/robsyme/3110241376/
but I'm looking to modulate the saturation of the colours based on each hit's evalue.
The color-tools gem makes it pretty easy to get a colour value, but I don't know how to tell bio-graphics to use a new colour for each feature.
Thanks Jan
-r

Jan Aerts said...

Rob,

You should indeed be able to adjust the colour of an individual feature. Just add ":colour => [0,0,1]" in your Bio::Feature.new call and that should do it.

Rob said...

Thanks Jan, worked it out.

track.add_feature(feature,:colour => [colour.r,colour.g,colour.b])

Thanks again
-r

Post a Comment