# Spurious Correlations, or, Why Nicolas Cage Must Be Stopped

There is a misguided assumption in a lot of media reporting on research that correlation equals causation. Correlation is a statistical relationship between two variables – for example, amounts of social service funding and crime rates – that assumes that one variable has some degree of dependence on the other.  In other words, if one variable changes, there should be a change in the other variable if the two are correlated.

There is a problem with this assumption, however – or at least it’s a problem for reporters who can’t be bothered to learn basic statistical concepts. A variable that is statistically correlated with another variable may change not because of a change in the other variable,  but because of factors that have absolutely nothing to do with that other variable. If you think of the large number of variables that could be related to amounts of social service funding (e.g. what activities the funding is being spent on, how or where the funding allocations are made) and to crime rates (e.g. what kinds of crimes, how much criminal activity is actually reported), you can see how a correlation cannot definitively prove that changes in funding for social services will result in changes in crime rates. And that is why statistics instructors always tell their students: correlation does not imply causation.

I’ve just come across a website, Spurious Correlations, that demonstrates this principle with some great examples. It seems that Nicolas Cage should be banned from making any more movies, because the more he appears in films, the more people drown in swimming pools.

(The correlation number at the bottom of the table indicates the strength of the relationship between the two variables. A positive number means that an increase in one variable relates to an increase in the other variable; a negative number means that an increase in one variable relates to a decrease in the other variable. The closer the correlation number is to +1 or -1, the stronger the relationship between the variables.)

It also appears that increased mozzarella consumption leads to more doctorates in civil engineering in the United States. Maybe hungry American PhD students eat more cheese?

And the website also allows you to generate your own spurious correlations. I know that it rains a lot in Washington, the US state closest to me. But I didn’t know that a decrease in precipitation in Washington leads to fewer lawyers in the Northern Mariana Islands.

A really great feature of this website is that all its spurious correlations are statistically significant. That is, based on the numbers of pieces of data that were used in the calculation, the correlations are unlikely to have occurred by chance. The fact that these correlations are meaningful by statistical standards – but utterly meaningless in terms of any real effect of the variables on each other – emphasizes even more strongly why it’s important to understand statistical concepts.

And it’s especially important to be able to think critically and analytically about statistics if you’re writing about research based on statistical analyses. If you don’t, you may end up misreporting the research and misleading your readers – which is a problem not only for you and for them, but also for society at large. Because that misleads us about the real reasons why things work as they do.