Working with Data? Here's How to Verify Your Sources and Numbers.

By: Amr Eleraqi | 09/01/2021

Amr Eleraqi is a former ICFJ Knight International Fellow.

The year 2020 wasn’t just dominated by the pandemic. It was also a year of open data. 

Many health-related organizations published daily and real-time updates about the spread of the virus around the world, circulating an unprecedented amount of numbers and figures. The challenge for journalists has been to analyze this information accurately, and communicate their findings to the public effectively.

It’s imperative that journalists first understand the data they’re working with. While there is often a rush to publish in today’s non-stop news cycle, doing so inaccurately does more harm than good. During a crisis like COVID-19, data can help raise critical awareness among the public. But if mishandled, it can place people at greater risk. 

Always analyze numbers with healthy skepticism. As journalists, we should investigate when and from where the data we use originates. We should determine who originally collected and published the numbers, as well as the funders behind it.

Journalists must also fix illogical or missing values, and clean up mislabeled figures. These errors may occur during the data entry process, whether done manually or automatically.

The Jordanian Ministry of Health, for example, used to manually enter some COVID-19 test results that didn’t automatically get uploaded into the government database. As the number of daily cases increased, results were lost, and mistakes related to the names and their samples were made, former Jordanian Health Minister Saad Jaber told local media

Keep in mind, too: even when using reliable software like Microsoft Excel, human error can sneak through. Take, for instance, this incident that occurred in the U.K. last year: 16,000 records of COVID-19 patients were accidentally deleted from an official database, resulting in the spread of inaccurate data which hindered efforts like contact tracing to combat the virus. 

To avoid publishing inaccurate data, rely on credible sources and verify the numbers. Here’s a checklist to help:

Transparency

Seek out resources that are transparent about how they compile and document data. This includes the technology and algorithms they used during the process. The more transparent data providers are, the more potential for accuracy there is. 

To this end, make sure you understand how data is being collected by the source you’re referencing. This will enable you to best analyze and verify numbers before you include them in your own reporting.

Methodology

Don’t publish a dataset without attaching the corresponding metadata file, which helps explain how the data was collected. It can also include information about sample size, error margin and missing values, and it includes a glossary of terms and abbreviations. Without these details, you’re like a person who has discovered a gold treasure chest, but doesn’t have the keys to open it. 

In Italy, for example, journalists questioned the credibility of official government data around COVID-19 after finding flaws in the numbers presented to the public. This can be attributed to a variety of factors, among them that the government changed their testing policies several times in 2020, and that methodologies to track cases of the virus differed by region. This contributed to inconsistent, deficient data overall. Had a metadata file been made available, these errors could have been more easily identified.

Context

Context is key when analyzing data. For example, consider how you present information about total infections and infection rates. When a government authority presents regional data about the number of people infected with COVID-19, a large city might show the highest value. This doesn’t necessarily mean its infection rates are highest, however — it might simply be the result of being the most populated area. 

The more appropriate way to compare numbers in locations with different populations is to calculate infection rates per 100 people. This will more accurately demonstrate the spread of the virus.

Understand the data

Don’t start working on a database unless you understand what is being presented. To do so, ask yourself the following questions: 

  • What does the data indicate?
  • Do I understand all terms and definitions included in the data?
  • What is not included in the data that could provide context?
  • What are the units of measurement?
  • Can you cross-reference the data with a different source, to corroborate the values? 

Verifying data requires investigation and analysis. Fortunately, journalists don’t need to be data analysis experts to carry this out. Journalistic values, skills and instinct are all effective methods to assist with fact-checking numbers. Manual verification can be even more effective than automated verification algorithms. While technology might not always be able to determine the credibility of data, it can provide journalists with useful tools and guidance to help. 

At all turns ask questions, be skeptical, and review and cross-reference your numbers as much as possible. The following diagram shows the steps I follow when dealing with numbers in a database. It might help you build your own verification strategy.

Latest News

ICFJ se Suma a Otras 9 Organizaciones Internacionales Para Presentar un Amicus Curiae en el Caso del Periodista Guatemalteco Encarcelado José Rubén Zamora

Un grupo de 10 organizaciones internacionales presentó esta semana un amicus curiae ante la Corte Suprema de Justicia de Guatemala en el caso del periodista José Rubén Zamora Marroquín. El amicus, presentado el 26 de marzo, argumenta que el retorno de Zamora a prisión preventiva constituye una violación de sus derechos fundamentales bajo el derecho guatemalteco e internacional, e insta a la Corte a otorgar un recurso de amparo pendiente y permitir que el Sr. Zamora lleve su proceso bajo medidas sustitutivas.

ICFJ Joins 9 Other International Organizations in Submitting Amicus Brief in Case of Imprisoned Guatemalan Journalist José Rubén Zamora

A group of 10 international organizations submitted an amicus curiae brief to Guatemala’s Supreme Court in the case of journalist José Rubén Zamora Marroquín. The brief, filed on March 26, argues that Zamora’s return to preventive detention constitutes a violation of his fundamental rights under Guatemalan and international law, and urges the Court to grant a pending amparo appeal and allow Zamora to return to house arrest.

Press Freedom on Campus: Why it Matters and What Student Journalists Need Most

In February, ICFJ partnered with Vanderbilt University and Freedom Forum, with support from the Lumina Foundation, to bring university faculty and students, international and U.S. journalists, and media leaders together for a discussion about the importance of press freedom in democratic societies. Participants stressed how U.S.-based journalists can learn from the experiences of their colleagues abroad, and they dived into the challenges campus reporters face, such as a lack of funding and harassment from fellow students, offered advice for engaging audiences, and more.