Today, March 14th, is Pi Day. In celebration, this post is related to the film Pi.

Pi is the first film by Darren Aronofsky, who went on to make Requiem for a Dream and Black Swan. I’ll try not to spoil too much, but the starting premise is that the main character, Max, is a mathematician/computer-scientist, who believes he can model the stock market and predict future stock behaviour, if only he finds the right model. I was recently reminded of this central quote from Pi (via Tom Crick), which can be heard in the film’s trailer:

*Restate my assumptions:*

- Mathematics is the language of nature.
- Everything around us can be represented and understood through numbers.
- If you graph these numbers, patterns emerge. Therefore: there are patterns everywhere in nature.

By stating his assumptions, Max is following the scientific process (hurrah!). This allows us to analyse his assumptions and see if he has made a mistake. Indeed — the implication of his third assumption is flawed: if you graph things, patterns do emerge — but they might well be spurious.

## Google Correlate

Google have released a tool that (inadvertently?) demonstrates this wonderfully: Google Correlate. The idea is that you can enter a term and see what other search terms produce a similar trend. That sounds somewhat useful. I decided to use the term “Greenfoot”. Here’s one of the top results I got at the time (Greenfoot is blue, the matching term is red):

That’s quite a decent match, and has a correlation coefficient of 0.9477. As Max suggested, we’ve graphed the numbers, and a pattern has emerged. This red term that matches so well with Greenfoot is… “Google Images”. Not very useful, and not much of a pattern: these two terms correlate well because they originated around the same time, and have grown in search-popularity with a similar pattern ever since. But really, this seems to me to be a spurious result (technically, a “type I” error): we’ve found an effect where really there is none.

This is the problem with Max’s approach. There are patterns everywhere if you look hard enough, but that doesn’t mean that they’re useful. And this is a real problem in science, especially with measurement techniques that generate a large amount of data (on which you can then perform a large variety of analysis). One example of a troublesome area is the neuroscience technique fMRI, where too many comparisons can lead to a dead fish detecting human emotions. The quality of our understanding of the human brain is dependent on statistics being applied properly… by human brains. (Recursion!)

And so in Pi, Max demonstrates the dark side of science: an obsession with finding a result that drives him so hard that he loses his impartiality and risks finding phantom results. There are techniques to mitigate this problem, called alpha-level correction, and I intend to cover some statistics in future blog posts which will explain these sorts of issues.

Comments on:"Looking Too Hard for Patterns" (1)Bill Grahamsaid:A very good post. Thank you. There is one underlying issue however. The lack of predictability!!! Western science is driven by the goal of predictability. Same for mathematics. However, patterns in Nature are complex dynamic systems that, by their very nature, are not predictable. This has been the challenge of complexity science. The famous three body problem is a simple example. Ecosystems, the stock market, weather, etc. are far more complex examples of things that cannot be predicted. The usefulness of mathematical models on any of these systems is questionable at best. And then there is the question of being able to verify or duplicate the results. I think we are at a very challenging point in the journey of Western science. We are moving from pure reductionism to a combination of reductionism and systems thinking. Yet, we really don’t have any good quantitative tools. And, even deeper, our goal can never again be predictability because it doesn’t exist.