Image by Kaeru via Flickr

I've recently started using raw visualizations to get an idea of what data looks like rather than writing scripts to summarize. And what I found is that presenting data visually in a raw format might be more useful than condensing everything down into just a few numbers. Trouble is that you need to know what you expect and make assumptions if you want to analyze the data. The best tool you have for identifying trends or non-randomness is yourself, not R or a scripting language.

Bought the Visualizing Data book by Ben Fry to help me with this. It explains how to use the Processing language to present data in a meaningful way.
3

Today is "Data management, mining, curation and visualization" day at the Genome Informatics conference in Hinxton. It might be one of the more interesting ones for me, because that's what I do: manage, mine, curate and attempt to visualize. And I must say the last bit the most difficult. It's not difficult to upload results into a genome browser, but is it the best way?

I say we have to break free from the track.
1

After investigating git for the bioruby project, I started using it on basically every project I run. And what do I use it for? Two things: keeping track of changes (duh) and syncing between server and laptop.

I normally try to get IT so far to let me mount my server Documents folder on my laptop when I'm at work. So ~/Documents actually points to my network drive. That's nice, because I don't have to bother with keeping track of several places to store my documents.
7

Disclaimer: This blog post is the result of several iterations of writing/discussion/rewriting from Anthony Underwood, Michael Barton, Matt Wood and myself, with additional help from Paul Thornthwaite.

Disclaimer nr 2: We are not yet git veterans ourselves, so if you see simpler ways of doing what we describe below (or spot any errors), please let us know so we can update this post and put it onto the bioruby wiki as well.

Disclaimer nr 3: This is a proposal. Bioruby has not moved to git yet.
4

Just a quick plug to see if I can find people interested in helping me out in some of my projects.

In the last 2 years, I started four open source projects (well: the last one was today...), each of which scratches my own itch and does what it needs to do for me. However, some features will have to be added and bugs be fixed to make them more useful to others (you, that is...).

If you are using one of these projects, please think about contributing.
3

It's been a while since my last post. Left my last job, was unemployed for a month (while still chairing a session at a conference), and just started my new position here at the Sanger Institute.

On every job you're able to pick up some new things that can help you out later. One of the good ones from Roslin was how to keep a labjournal for bioinformatics. In the position before Roslin (at Wageningen University in the Netherlands), I remember having trouble remembering what I did to my data.
4

Did you ever have data lying around that you couldn't figure out where you got it from?

You downloaded and imported data from an FTP site into your database ages ago and you actually want to use it now. But if different records come from different sources, it can be really challenging to know what data to trust or how to retrieve additional information afterwards. Not keeping track of the source of the data breaks the chain of provenance.

I've seen it happen.
4

Seasoned programmers know this: testing should be an integral part of developing any script/program/software suite. Part and parcel is the unit test, where you test every little aspect of your program little by little.

For larger projects using a bunch of library files, the setup for testing basically always looks the same: there's your /lib/ directory with your class definitions and your /test/unit/ directory which holds yours tests.

One of the issues in a library like Bio::Graphics, is the plethora of glyph types that users will want. Here's a little showcase of what's provided by the library:

Features on a DNA sequences can be represented as filled boxes, open boxes, boxes with arrows, lines, triangles, ... In this post, I'll show you (and remind myself) how I came to a version of the Bio::Graphics code that makes adding glyphs straightforward both by myself and the user.
3
Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Tags
Blog Archive
Links
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.