Ryo Sakai reminded me a couple of weeks ago about Simon Sinek's excellent TED talk "Start With Why - How Great Leaders Inspire Action"; which inspired this post... Why do I do what I do?

The way data can be analysed has been automated more and more in the last few decades. Advances in machine learning and statistics make it possible to gain a lot of information from large datasets. But are we starting to rely to much on those algorithms? Different issues seem to pop up more and more. For one thing, research in algorithm design has enabled many more applications, but at the same time makes these so complex that they start to operate as black boxes. Not only to the end-user who provides the data, but even for the algorithm developer. Another issue with pre-defined algorithms is that having these around precludes us to identifying unexpected patterns. If the algorithm or statistical test is not specifically written to find a certain type of pattern, it will not find it. Third issue: (arbitrary) cutoffs. Many algorithms rely heavily on the user (or even worse: the developer) defining a set of cutoff values. This is true in machine learning as well as statistics. A statistical test returning a p-value of 4.99% is considered "statistically significant", but you'd throw away your data if that p-value were 5.01%. What's the intrinsic thing at 5% that makes you have to choose between "yes, this is good" and "let's throw our hypothesis out the window"? All in all, much of this comes back to the fragility of using computers (hat tip to Toni for the book by Nassim Taleb): you have to tell them what to do and what to expect. They're not resilient to changes in setting, data, prior knowledge, etc; at least not as much as we are.

So where does this bring us? It's my firm belief that we need to put the human back in the loop of data analysis. Yes, we need statistics. Yes, we need machine learning. But also: yes, we need a human individual to actually make sense of the data and drive the analysis. To make this possible, I focus on visual design, interaction design, and scalability. Visual design because the representation of data in many cases needs improvement to be able to cope with high-dimensional data; interaction design because it's often by "playing" with the data that the user can gain insights; and scalability because it's not trivial to process big data fast enough that we can get interactivity.
2

View comments

Welcome
Welcome
Hi there, and welcome to SaaienTist, a blog by me, for me and you. It started out long ago as a personal notebook to help me remind how to do things, but evolved to cover more opinionated posts as well. After a hiatus of 3 to 4 years (basically since I started my current position in Belgium), I resurrect it to help me organize my thoughts. It might or might not be useful to you.

Why "Saaien tist"? Because it's pronounced as 'scientist', and means 'boring bloke' in Flemish.
About Me
About Me
Tags
Blog Archive
Links
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.