home Features Science Corner: Correlation does not equal causation.

Science Corner: Correlation does not equal causation.

It’s statistics 101 and yet it is one of the most common errors people make, in journalism, in science; in life. Let’s break it down. An instance or action can cause another, for example, I ate something that made me ill; or they can correlate together, e.g. I ate something and happened to get ill later. If one thing causes another, then they are correlated together. Although it can be a strong indicator of a relationship of some kind, just because two things occur together does not mean that one causes the other.

There is lots of academic literature backing up the belief that the human brain is designed to seek patterns, sometimes in random information, a phenomenon called apophenia. This tendency can lead to logical fallacies, such as cum hoc ergo propter hoc. This is Latin for “with this, therefore because of this,” (literal translation) and is the idea that because 2 things tend to happen together, one must start the other. The other logical fallacy most commonly associated with correlation vs causation is Post hoc ergo propter hoc, or for the non-latin speakers  “after this, therefore because of this,” the idea that if A happens before B, then A caused B to happen.

Take, for example, the much publicised study published in 1999 (1) that said;

“Young children who sleep with the light on are more likely to be short-sighted in later life. Therefore, sleeping with the light on causes short-sightedness.”

Seems fairly cut and dried, yes? Well no, actually: another study (2) found that infants who sleep with the light on aren’t more likely to become short-sighted. It did, however, find a strong link between short-sighted parents having short-sighted children. Additionally, short-sighted parents were more likely to leave the light on in their children’s rooms so they can see what’s going on. Therefore, parental short sightedness is the cause of both leaving the light on and the children’s short-sightedness.

There are many tests to establish varying degrees of correlation (if Pearson’s correlation coefficient rings a bell, you are a fellow SPSS monkey and have my most sincere sympathies. If it doesn’t, be happy) but clear causal relationships can be hard to establish beyond all scientific doubt. For any two correlated instances, there are a variety of ways they can be linked:

  • A causes B; (direct causation)
  • B causes A; (reverse causation)
  • A causes B and B causes A (cyclic causation);
  • A and B are consequences of a common cause, but don’t cause each other;
  • A causes C which causes B (indirect causation);
  • There is no connection between A and B; the correlation is a coincidence.

So how do we establish what event causes the effect? The most effective way is a controlled study. This is when two groups of people who are comparable in almost every way are given a single different experience (or as those of us in the biz call it, a variable). For example, think of the children mentioned in the study above. You have two almost identical groups of short sighted children, a random sampling across every socioeconomic group, every race, every background. The only difference between the two groups is that one will sleep with the lights on and one with the lights off. If the outcomes of the groups are different, then we can say that the variable is the cause for the difference.

Sometimes it’s simply not ethical to perform controlled studies. You can’t exactly force feed a group of people cigarettes for years while denying them to another group in your quest to prove that cigarettes cause lung cancer – but we know cigarettes cause lung cancer, so how did we prove it? Epidemiological (or observational) studies are studies where large groups of people and their behaviours/outcomes are followed over time. Although it can be hard to pull a direct cause-and-effect relationship from the data, it is sometimes possible. When scientists were investigating whether smoking causes lung cancer, many other causes were considered (examples including excessive drinking and lack of sleep), but each one was deemed inadequate compared to the prevalence of smoking in causing the development of lung cancer, leading scientists to conclude that smoking does indeed cause lung cancer.

The distinction between correlation and causation is somewhat lost in many mainstream media stories, with some outlets using words that clearly imply causality without actually saying it. Saying there is a “correlation” between something doesn’t really mean anything, just that two events seem to appear in tandem. However, saying that there is a “direct correlation” infers that there is a relationship, although you haven’t actually clarified what is causing what. This ambiguous language is how people can bend correlations to fit their own narratives.

Publishing phenomenon that clearly aren’t linked as though there is a correlation between them is just as bad. A news story saying that house prices in Washington D.C. correlate to a child’s reading ability emerged a few years ago, saying that increased house prices are linked to increased reading proficiency in children (3). Although there is a correlation, few people would be willing to say there is a direct link between reading proficiency on someone’s child and the price of their house. It’s far more likely that more expensive houses are in higher income areas that have access to better schools, which improves a child’s reading ability. Meanwhile, an article titled “30 years of research found a positive correlation between family involvement and a student’s academic success” (4) sounds a lot more plausible when referring to the causality between the two phenomenon, partially because it is backed up by additional literature, but also because it conforms to our own bias about what could be affecting the child’s academic achievements.

We’re simple beings – we like patterns and explanations for everything, but we must be careful not to be misled by lacklustre reporting and clever use of language. Unless the proof of causality is clear and overwhelming, never assume more than a correlation exists. After all, correlation does not equal causality.

References

  1. Quinn, Graham E.; Shin, Chai H.; Maguire, Maureen G.; Stone, Richard A. (May 1999). “Myopia and ambient lighting at night”. Nature399(6732): 113–4. doi:1038/20094PMID 10335839.
  2. Zadnik, Karla; Jones, Lisa A.; Irvin, Brett C.; Kleinstein, Robert N.; Manny, Ruth E.; Shin, Julie A.; Mutti, Donald O. (2000). “Vision: Myopia and ambient night-time lighting”. Nature404(6774): 143–144. doi:1038/35004661PMID 10724157.
  3. https://www.washingtonpost.com/blogs/all-opinions-are-local/wp/2015/07/22/the-correlation-between-test-scores-and-home-prices/?utm_term=.ff288a5b8c41
  4. http://www.wtsp.com/news/local/florida-wants-dads-to-take-kids-to-school/235269848