Worked a couple of days on pARP, the circular genome browser, and I think it's ready to be tested out by others. Consider this an alpha release: expect a lot of issues. It's easy to create regions with a negative length, for example. Also, I didn't focus yet on user-friendliness or general input files. Ways of interaction are not made clear to new users yet and the input files still need to have fixed names and be stored in a particular folder.

pARP is designed to be a genome browser for features that are linked to other features on a genome (e.g. readpair mappings). Using a circular display, lines can be drawn connecting these features.

pARP always shows the whole genome. You can zoom into selected regions but the rest is still shown albeit squeezed a bit more together. The reason for this is that I want to show the context at all times. Suppose you'd zoom into two regions A and B that are linked by a large number of readpairs. If the part of the genome that is not A or B is not shown any readpair that has only one of its reads in A or B will just not be shown. By showing the whole genome, even squeezed in a few pixels, you can at least see that some reads are linked outside of A and B.



I've put some information on the github wiki page, such as how to interact and what the datafiles should look like.

For a little taste: here's a very brief screencast:



A lot of things still need to happen:
  • Catch a lot of edge cases
  • Incorporate a library for fast loading of features (i.e. LocusTree, which doesn't exist yet)
  • Make interaction more straightforward: use mouse for panning/zooming for example
  • About 1,472 other things that I currently forget
Also: I'm looking for a new name for pARP. pARP stands for "processing abnormal readpairs" (which what is was meant for originally), but it's actually just a genome browser using a circular representation to show linked features. Suggestions I already got are encircle and SqWheel or Squeal (the last two based on sequence-wheel; Squeal was my own idea, so I like that most at the moment :-) ).

A very, very big thanks goes to Jeremy Ashkenas, the author of ruby-processing. With pARP I have been pushing the boundaries of what that library does, and he has adapted it for my needs as I went. See here for his ruby-processing library. Other thanks go to my colleagues Erin, Klaudia, Jon, Nelo and Chris for their ideas.

pARP can be downloaded or cloned from github. Mac, Windows and linux are available there as well.

Ryo Sakai reminded me a couple of weeks ago about Simon Sinek's excellent TED talk "Start With Why - How Great Leaders Inspire Action"; which inspired this post... Why do I do what I do?

The way data can be analysed has been automated more and more in the last few decades. Advances in machine learning and statistics make it possible to gain a lot of information from large datasets. But are we starting to rely to much on those algorithms? Different issues seem to pop up more and more. For one thing, research in algorithm design has enabled many more applications, but at the same time makes these so complex that they start to operate as black boxes. Not only to the end-user who provides the data, but even for the algorithm developer.
2

"I'll do Angelina Jolie". Never thought I'd say that phrase while talking to well-known Belgian cartoonists, and actually be taken serious.

Backtrack about one year. We're at the table with the crème-de-la-crème of Belgium's cartoon world (Zaza, Erwin Vanmol, LECTRR, Eva Mouton, ...), in a hotel in Knokke near the coast.  "We" is a gathering of researchers covering genetics, bioinformatics, ethics, and law. The setup: the Knokke-Heist International Cartoon Festival.

We could still use more applicants for this position, so bumping the open position...

SymBioSys is a consortium of computational scientists and molecular biologists at the University of Leuven, Belgium focusing on how individual genomic variation leads to disease through cascading effects across biological networks (in specific types of constitutional disorders and cancers). We develop innovative computational strategies for next-generation sequencing and biological network analysis, with demonstrated impact on actual biological breakthroughs.

Since the publication of the human genome sequence about a decade ago, the popular press has reported on many occasion about genes allegedly found for things ranging from breast size, intelligence, popularity and homosexuality to fidgeting. The general population is constantly told that the revolution is just around the corner.
2

Bit of a technical post for my own reference, about visualization and scripting in clojure.

Clojure and visualization

Being interested in clojure, a tweet by Francesco Strozzi (@fstrozzi) caught my attention last week: "A D3 like #dataviz project for #clojure. Codename C2 and looks promising. http://keminglabs.com/c2/. They need contribs so spread the word!" I tried a while ago to do some stuff in D3, but the javascript got in the way so I gave up after a while.

Finally time to write something about the biovis/visweek conference I attended about a week ago in Providence (RI)... And I must say: they'll see me again next year. (Hopefully @infosthetics will be able to join me then). Meanwhile, several blog posts are popping up discussing it (see here and here, for example).

This was the first time that biovis (aka the IEEE Symposium on Biological Data Visualization) was organized.

I was invited last week to give a talk at this year's meeting of the Graduate School Structure and Function of Biological Macromolecules, Bioinformatics and Modeling (SFMBBM). It ended up being a day with great talks, by some bright PhD students and postdocs. There were 2 keynotes (one by Prof Bert Poolman from Groningen (NL) and one by myself), and a panel discussion on what the future holds for people nearing the end of their PhDs.
4

Last Friday I received my long-anticipated copy of "Visualize This" by Nathan Yau. On its website it is described as a "practical guide on visualization and how to approach real-world data". You can guess what my weekend looked like :-)

Overall, I believe this book is a very good choice for people interested in getting started in data visualization.

UPDATE: I encountered a blog post by Martin Theus describing a very similar approach for looking at this same data (see here).

Disclaimer 1: This is a (very!) quick hack. No effort was put in it whatsoever regarding aesthetics, interactivity, scaling (e.g. in the barcharts), ... Just wanted to get a very broad view of what happened during the Tour de France (= biggest cycling event each year).

Disclaimer 2: I don't know anything about cycling.
Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Tags
Blog Archive
Links
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.