UPDATE: I encountered a blog post by Martin Theus describing a very similar approach for looking at this same data (see here).

Disclaimer 1: This is a (very!) quick hack. No effort was put in it whatsoever regarding aesthetics, interactivity, scaling (e.g. in the barcharts), ... Just wanted to get a very broad view of what happened during the Tour de France (= biggest cycling event each year).
Disclaimer 2: I don't know anything about cycling. It was actually my wife who had to point out to me which riders could be interesting to highlight in the visualization. But that also meant that this could become interesting for me to learn something about the Tour.




Data was copied from the Tour de France website (e.g. for the 1st stage). Visualization was created in processing.

The parallel coordinate plot shows the standings of all riders over all 21 stages. No data was available for stage 2, because that was a team time-trial (so discard that one). At the top is the rider who came first, at the bottom who came last. Below the coordinate plot are little barcharts displaying the distribution in arrival time (in "number of seconds later than the winner") for all riders in that stage.

The highlighted riders are: Cavendish (red), Evans (orange), Gilbert (yellow), Andy Schleck (light blue) and Frank Schleck (dark blue).

So what was I able to learn from this?

  • Based on the barcharts you can guess which trips were in the mountains, and which weren't. You'd expect that the riders become much more separated in the mountains than on the flat. In the very last stage in Paris, for example, everyone seems to have arrived in one big group. Whereas for stages 12-14 the riders were much more spread. So my guess (and that's confirmed by checking this on the TourDeFrance website :-) is that those were mountain stages.
  • You can see clear groups of riders who behave the same. There is for example a clear group of riders who performed quite badly in stage 19 but much better in stage 20 (and bad in 21 again).
  • As the parallel coordinate plots were scaled according to the initial number of riders, we can clearly see how people left the Tour because the "bottom" of the later stages are empty.
  • We see that Cavendish (red) has very erratic performance. And it seems to co-incide with trips where the arrival times are spread out (= mountain trips?). This could mean that Cavendish is good on the flats, but bad in the mountains. Question to those who know something about cycling: is that true?
  • Philippe Gilbert started good (both on the flats and in the mountains), but became more erratic halfway through the Tour.

Ryo Sakai reminded me a couple of weeks ago about Simon Sinek's excellent TED talk "Start With Why - How Great Leaders Inspire Action"; which inspired this post... Why do I do what I do?

The way data can be analysed has been automated more and more in the last few decades. Advances in machine learning and statistics make it possible to gain a lot of information from large datasets. But are we starting to rely to much on those algorithms? Different issues seem to pop up more and more. For one thing, research in algorithm design has enabled many more applications, but at the same time makes these so complex that they start to operate as black boxes. Not only to the end-user who provides the data, but even for the algorithm developer.
2

"I'll do Angelina Jolie". Never thought I'd say that phrase while talking to well-known Belgian cartoonists, and actually be taken serious.

Backtrack about one year. We're at the table with the crème-de-la-crème of Belgium's cartoon world (Zaza, Erwin Vanmol, LECTRR, Eva Mouton, ...), in a hotel in Knokke near the coast.  "We" is a gathering of researchers covering genetics, bioinformatics, ethics, and law. The setup: the Knokke-Heist International Cartoon Festival.

We could still use more applicants for this position, so bumping the open position...

SymBioSys is a consortium of computational scientists and molecular biologists at the University of Leuven, Belgium focusing on how individual genomic variation leads to disease through cascading effects across biological networks (in specific types of constitutional disorders and cancers). We develop innovative computational strategies for next-generation sequencing and biological network analysis, with demonstrated impact on actual biological breakthroughs.

Since the publication of the human genome sequence about a decade ago, the popular press has reported on many occasion about genes allegedly found for things ranging from breast size, intelligence, popularity and homosexuality to fidgeting. The general population is constantly told that the revolution is just around the corner.
2

Bit of a technical post for my own reference, about visualization and scripting in clojure.

Clojure and visualization

Being interested in clojure, a tweet by Francesco Strozzi (@fstrozzi) caught my attention last week: "A D3 like #dataviz project for #clojure. Codename C2 and looks promising. http://keminglabs.com/c2/. They need contribs so spread the word!" I tried a while ago to do some stuff in D3, but the javascript got in the way so I gave up after a while.

Finally time to write something about the biovis/visweek conference I attended about a week ago in Providence (RI)... And I must say: they'll see me again next year. (Hopefully @infosthetics will be able to join me then). Meanwhile, several blog posts are popping up discussing it (see here and here, for example).

This was the first time that biovis (aka the IEEE Symposium on Biological Data Visualization) was organized.

I was invited last week to give a talk at this year's meeting of the Graduate School Structure and Function of Biological Macromolecules, Bioinformatics and Modeling (SFMBBM). It ended up being a day with great talks, by some bright PhD students and postdocs. There were 2 keynotes (one by Prof Bert Poolman from Groningen (NL) and one by myself), and a panel discussion on what the future holds for people nearing the end of their PhDs.
4

Last Friday I received my long-anticipated copy of "Visualize This" by Nathan Yau. On its website it is described as a "practical guide on visualization and how to approach real-world data". You can guess what my weekend looked like :-)

Overall, I believe this book is a very good choice for people interested in getting started in data visualization.

UPDATE: I encountered a blog post by Martin Theus describing a very similar approach for looking at this same data (see here).

Disclaimer 1: This is a (very!) quick hack. No effort was put in it whatsoever regarding aesthetics, interactivity, scaling (e.g. in the barcharts), ... Just wanted to get a very broad view of what happened during the Tour de France (= biggest cycling event each year).

Disclaimer 2: I don't know anything about cycling.
Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.