One Data Point Can Beat Big Data | annotated by Tien 🌸🌸🌸

In 2008, media across the world announced with fanfare that Google engineers had found a method to predict the spread of the flu early on. The idea appeared sound. Users infected with the flu are likely to use Google’s search engine to diagnose their symptoms and look for remedies. These queries could instantly tell where the flu is spreading. To find the apt queries, engineers analyzed some 50 million search terms and calculated which of these were associated with the flu. Then they tested 450 million different algorithms to find the one that best matched with the data and came up with a secret algorithm that used 45 search terms (also kept secret). The algorithm was then used to predict flu-related doctor visits in each region on a daily and weekly basis.

At first, all went splendidly. Google Flu Trends forecasted the flu faster than the reports of the Center for Disease Control. Google even coined a new term: to “nowcast” the spread of flu and influenza-related diseases in each region of the United States, with a reporting lag of about one day.

Months later, in the spring of 2009, something unexpected happened. The swine flu broke out. It barreled in out of season, with the first cases in March and a peak in October. Google Flu Trend missed the outbreak; it had learned from the years before that flu infections were high in the winter and low in the summer. Predictions crumbled.

After this setback, the engineers embarked on improving the algorithm. To do so, there are two possible approaches. One is to fight complexity with complexity. The idea is: Complex problems need complex solutions, and if a complex algorithm fails, it needs to be made more complex. The second approach follows the stable-world principle—complex algorithms work best in well-defined, stable situations where large amounts of data are available. Human intelligence has evolved to deal with uncertainty, independent of whether big or small data are available. The idea behind it is that a complex algorithm using big data from the past may not predict the future well in uncertain conditions and it therefore should be simplified. Google’s engineers went for more complexity. Instead of paring down the 45 search terms (features), they jacked these up to about 160 (the exact number has not been made public) and continued to bet on big data.

Complex algorithms work best in well-defined, stable situations where large amounts of data are available. Human intelligence has evolved to deal with uncertainty, independent of whether big or small data are available.

The revised algorithm first did a good job of predicting new cases, but not for long. Between August 2011 and September 2013, it overestimated the proportion of expected flu-related doctor visits in 100 out of 108 weeks (see Figure 1). One major reason was the instability of the flu itself. Influenza viruses are like chameleons, constantly changing, making it extremely difficult to predict their spread. The symptoms of swine flu, such as diarrhea, differed from those in past years, and the infection rate was higher for younger people than with other strains of the flu. A second reason was the instability of human behavior. Many users entered flu-related search terms out of sheer curiosity about swine flu, not because they felt sick. But the algorithm could not distinguish between the motivations for search. The engineers asked “Is our model too simple?” and continued to tinker with the revised algorithm, to no avail. In 2015, Google Flu Trends was quietly shut down.

Figure 1: A simple heuristic using a single data point can predict the flu better than Google’s big data analytics. Shown here is the actual percentage of flu-related doctor visits from March, 18, 2007 to August 9, 2015, and its predictions by the recency heuristic and by Google Flu Trends (including three updates). Top: Predictions and observed values in absolute terms. The predictions by the recency heuristic and the observed values are virtually identical. Bottom: Prediction errors. The years signify the beginning of the year, that is, 2008 indicates January 1, 2008. For instance, in the summer of 2009, Google Flu Trends underestimated the spread of the flu thanks to the unexpected breakout of the swine flu, after which it received its first update. Source: Katsikopoulos et al. (2022)

Some may shrug their shoulders and say, yes, we’ve heard this all before, but that was 2015; today’s algorithms are infinitely bigger and better. But my point is not the success or failure of a particular algorithm developed by the company Google. The crux is that the stable-world principle applies to all algorithms that use the past to predict an indeterminable future. Before Google’s big data analytics flopped, its claim to fame was taken as proof that scientific method and theory were on the brink of becoming obsolete. Blind and rapid search through terabytes of data would be sufficient to predict epidemics. Similar claims were made by others for unraveling the secrets of the human genome, of cancer, and of diabetes. Forget science; just increase volume, velocity, and variety and measure what correlates with what. Chris Anderson, editor-in-chief of Wired, announced: “Correlation supersedes causation, and science can advance even without coherent models ... It’s time to ask: What can science learn from Google?”

Let me pose a different question. What can Google learn from science?

Under uncertainty, keep it simple and don’t bet on the past

The Google engineers never seem to have considered a simple algorithm in place of their big data analytics. In my research group at the Max Planck Institute for Human Development, we’ve studied simple algorithms (heuristics) that perform well under volatile conditions. One way to derive these rules is to rely on psychological AI: to investigate how the human brain deals with situations of disruption and change. Back in 1838, for instance, Thomas Brown formulated the Law of Recency, which states that recent experiences come to mind faster than those in the distant past and are often the sole information that guides human decision. Contemporary research indicates that people do not automatically rely on what they recently experienced, but only do so in unstable situations where the distant past is not a reliable guide for the future. In this spirit, my colleagues and I developed and tested the following “brain algorithm”:

Recency heuristic for predicting the flu: Predict that this week’s proportion of flu-related doctor visits will equal those of the most recent data, from one week ago.

Unlike Google’s secret Flu Trends algorithm, this rule is transparent and can be easily applied by everyone. Its logic can be understood. It relies on a single data point only, which can be looked up on the website of the Center for Disease Control. And it dispenses with combing through 50 million search terms and trial-and-error testing of millions of algorithms. But how well does it actually predict the flu?

Three fellow researchers and I tested the recency rule using the same eight years of data on which Google Flu Trends algorithm was tested, that is, weekly observations between March 2007 and August 2015. During that time, the proportion of flu-related visits among all doctor visits ranged between one percent and eight percent, with an average of 1.8 percent visits per week (Figure 1). This means that if every week you were to make the simple but false prediction that there are zero flu-related doctor visits, you would have a mean absolute error of 1.8 percentage points over four years. Google Flu Trends predicted much better than that, with a mean error of 0.38 percentage points (Figure 2). The recency heuristic had a mean error of only 0.20 percentage points, which is even better. If we exclude the period where the swine flu happened, that is before the first update of Google Flu Trends, the result remains essentially the same (0.38 and 0.19, respectively).

Figure 2: Less can be more. Using a single data point, one can predict the spread of the flu better than Google Flu Trends, a big data algorithm. The mean absolute error (from Figure 1) in predicting the proportion of flu-related doctor visits is 0.38 for Google Flu Trends, but only 0.20 when using one single data point, that is, recency. Both algorithms were tested on the same weekly data between March, 18, 2007 to August 9, 2015 (see text).

“Fast-and-frugal” psychological AI

The case of Google Flu Trends demonstrates that in an unstable world, reducing the amount of data and complexity can lead to more accurate predictions. In some cases, it might be advisable to ignore everything that happened in the past and instead rely on the most recent data point alone. It also shows that psychological AI—here, the recency heuristic—can match complex machine learning algorithms in prediction. In general, my point is that “fast-and-frugal” heuristics that need little data are a good candidate for implementing psychological AI.

The case of Google Flu Trends demonstrates that in an unstable world, reducing the amount of data and complexity can lead to more accurate predictions.

Nevertheless, it’s difficult for many of us to get around the idea of deliberately leaving out data when we’re trying to make an informed decision. But the flu example is neither a fluke nor an exception. Under uncertainty, simple rules such as recency have also been shown to be highly effective in comparison with complex algorithms, be it in predicting consumer purchases, repeat offenders, heart attacks, sport results, or election outcomes. A group of economists, including Nobel laureate Joseph Stiglitz, showed for instance that the recency heuristic can predict consumer demand in evolving economies better than traditional “sophisticated” models. And the great advantage of simple rules is that they are understandable and are easy to use.