Wednesday, 12 November 2008

Visualize or summarize?

Visualization of my bookmarksImage by Kaeru via Flickr

I've recently started using raw visualizations to get an idea of what data looks like rather than writing scripts to summarize. And what I found is that presenting data visually in a raw format might be more useful than condensing everything down into just a few numbers. Trouble is that you need to know what you expect and make assumptions if you want to analyze the data. The best tool you have for identifying trends or non-randomness is yourself, not R or a scripting language.

Bought the Visualizing Data book by Ben Fry to help me with this. It explains how to use the Processing language to present data in a meaningful way. As far as I understand, Processing is a wrapper around the java language so that it becomes much more intuitive to use for simple people like me. The language is so easy that there was only a very small learning curve for me, even though I didn't know anything about java other than that it's an island and a coffee. In several of my projects I now start with writing a simple processing script and then throw all my data at it. No assumptions made. The fact that it's easy to interact with a display with mouse or keyboard makes it even more useful.

The processing code editor makes creating a java applet or application a matter of one click, so it's easy to make your displays available for other people.

It's only after having a look at the data that I write analysis scripts or help other people in deciding how to analyze/summarize. That analysis can then help to for example make a more opinionated display. So it's display, analyze, display, analyze, ... as in a hermeneutic circle.

One issue with this approach is that you have to be able to think of a meaningful display. And I must say that's often (but not always) the more difficult bit. I started following the RSS feeds of some visualization blogs like FlowingData as well as the Processing website itself to get exposed to different types of visualization, which does help.

Update: Follow the discussion on FriendFeed.


  1. mmm interesting!
    But what are the differences between R and this Processing language?

  2. @gioby: As R is a statistical language, you'd typically perform a bunch of analyses and at the end create a histogram or something to display. But emphasis is on analysis (with P-values and things). Processing however is only for visualizing: you'd typically throw your raw data at it and use your own eyes to look for patterns (instead of a statistical test). This is just to get ideas or illustrate, not to (statistically) prove.

  3. Processing has also been ported to Javascript, which means you can run your scripts right in the browser and get a "canvas" object that shows the result. See Processing.js.