Researchers around the world are focused on improving our capacity to predict and prevent epidemics. These efforts are almost universally founded on the concept of big data. Given the failings of big data to predict the outbreak of Ebola in recent years, critics are concerned that this approach may be inherently flawed. Is this true?

Let’s analyze the situation:

Rising as a buzzword in the early 2000’s, “big data” refers to the availability of large sets of data that can be used to make inferences. Using big data can lead to a lack of nuance, and often depersonalizes matters — which makes it a controversial approach in some fields. While corporations seeking profit often use customer information for logistics or advertising, big data can also be used for humanitarian purposes.

Most notably, organizations across the world have been using big data to further disease prevention efforts. An example of this can be found in the international response to the 2014 Ebola outbreak in West Africa. Using big data, public health workers were able to track the spread of the disease in real-time, in a process known as disease mapping.

Disease mapping allows us to identify where a disease is likely to spread. This gives humanitarian organizations an idea of where they should send resources and workers. Treatment can be tricky, though researchers have found promise in drugs already on the market, such as sertraline. The Ebola outbreak has been contained as of January 2016, though flare-ups may still occur.

Ironically, while the practice of using big data can be heralded for containing the epidemic, it can also be blamed for failing to foresee the outbreak in the first place. During the panic, the CDC predicted that there would be 1.4 million cases of the fatal disease. The World Health Organization made a much closer estimate at 20,000 cases, but was still off the mark by a considerable percentage. Actually, there have been approximately 13,000 cases of Ebola in the past two years.

Ebola Virus (Wikimedia Commons)

How did we get this information so wrong?

Clearly, there is a problem with placing too much faith in big data. While it can be useful in some applications, we need to be aware of its limitations — or, rather, our lack of data. Our inferences can only be accurate if we have a large amount of reliable data.

The key to improving this approach lies in better data collection techniques. As surveillance technology and health reporting becomes more advanced, more useful data will be available to researchers. Factors that make an area predisposed to an infectious outbreak, such as climate, sanitation, and water supply, must also be considered when gauging the potential for an outbreak. Accurate predictions allow us to effectively stifle outbreaks before they occur.

In the words of Andrew McAfee, “the world is one big data problem.” Big data could be the solution to epidemics in the future, but collecting the necessary data is a difficult task. While high-profile errors have caused pundits to critique the use of big data in disease prevention, researchers are developing new ways to predict and prevent diseases every day. As relevant data becomes more readily available, our ability to prevent epidemics in the future will improve.