Jobb

How to trend twitter on election night with rapid data analysis, visualization and lots of poorly written R code. A step by step guide.

Working election night is about as fun as your job gets if you are a journalist. Fast pace, full audience attention and of course events that will shape the future of the country.

As a data journalist you are working under the following conditions:

  • You will have tons of data at your hands.
  • Your workflow has to be fast.
  • Your results have to be accurate.
  • You will be tired.

Enter code. By automating your workflow you can address all these issues. Let the code do the work for you.

We worked election night at Aftonbladet, the leading evening newspaper in Sweden. This is how we planned and executed our analysis with R. And how we ended up trending twitter.

1. Preparations

A week in advance we started listing questions that we wanted to answer on election night. What are the strongholds of the different parties? Where do they gain and loose votes? Are there socio-economic factors that can explain the results?

Skärmavbild 2014-09-16 kl. 11.27.36

We decided to focus our analysis on the municipalities as there are endless amounts of data available on this level. We put together a spreadsheet with about 30 key measurements about all 290 municipalities (get the data on Google Drive here).

Skärmavbild 2014-09-16 kl. 11.34.35

 

Next step: R.

R is an open source statistical programming language. My previous experience in R was rather limited. I knew the basic concepts, but I haven’t really used it properly. With prior knowledge of Python and web development I have found R to be rather difficult to get my head around.

What I’m trying to say: a) you can do this too, b) my code probably sucks – but at least it works.

The code consists of two parts. First a script that fetches data from Aftonbladet’s own result API. Second a bunch of small analysis scripts that outputs lists and charts.

Skärmavbild 2014-09-18 kl. 10.05.02

The full code is available at GitHub.

2. Execution

At around 11pm more than 90 percent of the votes were counted and we were able to start digging into the data. One of the visualization formats that we had prepared was a comparison of the voting behavior of the richest/poorest, most/least educated, oldest/youngest municipalities.

compare – medianinkomst

 

With the code well prepared it only took a key stroke to generate ten of these charts. Within an hour of the results we were able to deliver a broad range of data-driven analysis.

Skärmavbild 2014-09-18 kl. 10.14.45

One of the prepared outputs was a correlation matrix that helped us look for clues about socio-economic measurements that could help us understand the election results. With a bit of coloring that correlation matrix looked like this in Excel.

Skärmavbild 2014-09-18 kl. 10.17.42

Red is for negative correlation, green is for positive. The stronger the color, the stronger the relationship.

The big winners in the elections were the Sweden Democrats. So the main question was how we could understand their success. These are the factors that correlates most strongly with the growth of the party.

Skärmavbild 2014-09-18 kl. 10.20.24

 

The key findings from this matrix were turned into scatterplots. Like this one, showing how the Sweden Democrats got more votes in municipalities with shrinking populations.

scatterplot - befolkningsförändring

 

I summary of the visualizations and conclusions we did can be found on Storify.

Skärmavbild 2014-09-18 kl. 10.28.43

 3. Reception

Skärmavbild 2014-09-18 kl. 10.33.37

We ended up getting quite a lot of attention for our work. Our visualizations were retweeted not only in the tens and hundreds, but even in thousands.

The most successful one was this about how municipalities with high and low education level votes differently.

Skärmavbild 2014-09-18 kl. 10.35.13

Conclusions

Should all journalists know how to code? Probably not. But this case shows how programming skills can be of great use in the newsroom.

In addition to the arguments outlined in the beginning of this post (efficiency, reproducibilty, error spotting), an advantage with code is that it can increase the transparency and hence legitimacy of your work (if you publish your code that is). Just as in science your audience should be able to recreate your findings.

3 Svar till “Covering election night with R”

  1. Kenneth Gold

    There is real science in voting analysis; we have to show our cards. I work w Florida data which is publicly available at the Voter ID level. I can’t find anyone who has predicted absolute voting margin. I’m about 60-90 days from a forecast which I will share w you. Have you made an out-of-sample prediction based on test voting data? Do you know of anyone who has? Thx.

    Svara

Trackbacks/Pingbacks

  1.  Samanfattning: Tack för allt Supervalåret! | Journalism++ Stockholm
  2.  Playing with Swedish election data | Follow the Data

Kommentera

E-postadressen publiceras inte. Obligatoriska fält är märkta *