Saturday, August 30, 2014

Data journalism (MOOC)

Data journalism (MOOC)

Available on the Canvas platform, this course is titled Doing journalism with data: first steps, skills and tools and is part of the European Journalism Center initiative. It is aimed at providing concepts, techniques and skills to effectively work with data and produce compelling data stories.


The course is divided into five modules, each of them with a duration of one week, and about one hour of video. Each video is followed by a short quiz and a discussion points section, which is a short list of questions to answer in the forum. They sometimes involve a little bit of research. It helps to go deeper into the subject than the mere knowledge or understanding check of the quiz, as we have to think about it and find an answer that is not displayed under our eyes. Plus it has the benefit to provoke a discussion between the participants, and to see other points of view. The participants background being diverse, as there is no prerequisite for this course, event though most of the participants are journalists, each person’s views bring new light on the subject.

The course is provided in English, and the videos can be subtitled in English, French, Spanish, Japanese and Korean.

The final exam quiz, with its 22 questions, gives right to an EJC Certificate of Completion when the mark is 70% or higher. The quizzes proposed after each video are only there to check your understanding of the course content.

There were 23,715 students enrolled in this course from May, 19 to July 31 2014, and 1,250 certificates of completion have been issued.


The five modules are presented by renown professionals of the field:
  • Data journalism in the newsroom: what is data journalism, inside a data team, how to get a story, and the business case for data journalism. It is presented by Simon Rogers, Data Editor, Twitter and former editor of the Guardian’s award-winning Datablog.
  • Finding data to support stories: setting up data newswires, advanced searching strategies, introduction to scraping, data laws and sources. It is presented by Paul Bradshaw, Head of the Online Journalism MA at Birmingham City University, and Visiting Professor at City University’s School of Journalism in London.
  • Finding story ideas with data analysis: newsroom math and statistics, sorting and filtering data in a spreadsheet, making new variables with functions, summarizing data with pivot tables. It is presented by Steve Doig, Knight Chair in Journalism at the Walter Cronkite School of Journalism & Mass Communication of Arizona State University, and Pulitzer Prize winner. He explains simple maths like percentage change and rates, mean, median, mode, outliers, normal distribution, range, quartiles, and standard deviation, all the useful maths to make numbers tell a story, then he goes through creating variables and spreadsheet functions up to pivot tables.
  • Dealing with messy data: correct bad formatting, misspellings, invalid values and duplicates, advanced cleaning techniques. It is presented by Nicolas Kayser-Bril, Co-founder and Head at data journalism startup Journalism++.
  • Telling stories with visualization: the main principles of data visualization, choosing the best graphic forms, the art of insight, hands-on with Adobe Illustrator. It is presented by Alberto Cairo, Professor of the Professional Practice at the University of Miami.

Notes and remarks

  • There is a lot of data floating around us on the web, a big part of it freed via government open data initiatives like for the USA, some via other means (WikiLeaks, OpenCorporate), and some through freedom of information laws. But this huge amount of data needs to be interpreted and presented in a way it can be understood and serve a purpose. This is where data journalism is important, as it extracts a meaning from that data, and tells stories from them. Data journalism encompasses a diverse set of knowledges, combining journalism, design and coding (some even go further by calling it javascript journalism, which reduces its scope to the technical side of it).
  • The Google spreadsheet scraper tool available on Google Drive helps you grab data from a webpage without the technical knowledge of writing an actual scraper yourself (if the data are presented in form of list or table).
  • Finding the story and visualizing it are the two sides of the same coin when we’re talking about data journalism, exactly as finding the story and telling it through text and pictures would be in classic journalism. The only risk with data journalism is to put the visualization before the story, in which case we end up with beautiful interactive things that are of no use.
  • The Texas Tribune data page shows visualizations of public available data. This section of the site accounts for 45% to 55% of the whole website traffic. This example is very interesting, as it shows that providing your visitors with some visual data targeted to your audience, even though this data is freely available, has real added value for them. They read it, use it and share it, which answers your journalistic goal of informing people (in an engaging way), and your website goals of acquiring traffic through social network leveraging.
  • Steve Doig’s spreadsheet quizzes are very useful hand on quizzes, as we have to use our own spreadsheet software (like OpenOffice Calc) to answer them, in a learning by doing approach.
  • Alberto Cairo, besides being the author of The Functional Art: An introduction to information graphics and visualization, publishes a very informative blog about data visualizations.
  • Visualizations and infographics sites to get some ideas: FlowingData, Information aesthetics, Cool Infographics.
  • Many visualizations are beautiful, and even functional, but not particularly insightful. They need to help users better understand the data. The data must be put in context to tell a story, to reveal something. In the end, they should be enlightening too, they should change the users’ view about the facts they represent.
  • In interactive visualizations, it is not only appropriate to represent your data more than once, but most of the time even necessary. We need to give the users opportunities to answer different questions and provide them with different views on the data.
  • Insight is the discovery of non-trivial, complex, deep, unexpected, or relevant truths about the dataset.
  • An annotation layer, by providing insights in the dataset, by putting data in context and highlighting relevant data points, helps users enter the visualization.
  • Enlightenment is the next step: the sense of deeply changing readers’ mind for the better.

Suggested reading

The Data Journalism Handbook

The Data Journalism Handbook, by Jonathan Gray, Lucy Chambers and Liliana Bounegru (2012)

Journalisme de données (MOOC) (in French)
Periodismo de datos (MOOC) (in Spanish)
Jornalismo de dados (MOOC) (in Portuguese)

No comments:

Post a Comment