How to trend twitter on election night with rapid data analysis, visualization and lots of poorly written R code. A step by step guide.
Working election night is about as fun as your job gets if you are a journalist. Fast pace, full audience attention and of course events that will shape the future of the country.
As a data journalist you are working under the following conditions:
- You will have tons of data at your hands.
- Your workflow has to be fast.
- Your results have to be accurate.
- You will be tired.
Enter code. By automating your workflow you can address all these issues. Let the code do the work for you.
We worked election night at Aftonbladet, the leading evening newspaper in Sweden. This is how we planned and executed our analysis with R. And how we ended up trending twitter.
A week in advance we started listing questions that we wanted to answer on election night. What are the strongholds of the different parties? Where do they gain and loose votes? Are there socio-economic factors that can explain the results?
We decided to focus our analysis on the municipalities as there are endless amounts of data available on this level. We put together a spreadsheet with about 30 key measurements about all 290 municipalities (get the data on Google Drive here).
Next step: R.
R is an open source statistical programming language. My previous experience in R was rather limited. I knew the basic concepts, but I haven’t really used it properly. With prior knowledge of Python and web development I have found R to be rather difficult to get my head around.
What I’m trying to say: a) you can do this too, b) my code probably sucks – but at least it works.
At around 11pm more than 90 percent of the votes were counted and we were able to start digging into the data. One of the visualization formats that we had prepared was a comparison of the voting behavior of the richest/poorest, most/least educated, oldest/youngest municipalities.
With the code well prepared it only took a key stroke to generate ten of these charts. Within an hour of the results we were able to deliver a broad range of data-driven analysis.
One of the prepared outputs was a correlation matrix that helped us look for clues about socio-economic measurements that could help us understand the election results. With a bit of coloring that correlation matrix looked like this in Excel.
Red is for negative correlation, green is for positive. The stronger the color, the stronger the relationship.
The big winners in the elections were the Sweden Democrats. So the main question was how we could understand their success. These are the factors that correlates most strongly with the growth of the party.
The key findings from this matrix were turned into scatterplots. Like this one, showing how the Sweden Democrats got more votes in municipalities with shrinking populations.
I summary of the visualizations and conclusions we did can be found on Storify.
We ended up getting quite a lot of attention for our work. Our visualizations were retweeted not only in the tens and hundreds, but even in thousands.
The most successful one was this about how municipalities with high and low education level votes differently.
Should all journalists know how to code? Probably not. But this case shows how programming skills can be of great use in the newsroom.
In addition to the arguments outlined in the beginning of this post (efficiency, reproducibilty, error spotting), an advantage with code is that it can increase the transparency and hence legitimacy of your work (if you publish your code that is). Just as in science your audience should be able to recreate your findings.