Thursday, January 14, 2016

Simpson's Paradox

What if the same sets of data could show completely opposite trends depending on the grouping of the data? The goal of this blog is to document my reaction to recent articles that address the bias and irreproducibility in science, but as I was reading on the topic, I came across this video that I thought was worth sharing. One of the articles that caught my attention was Jeremy Berg's letter on ASBMB. The letter outlined what the scientific community should do to enhance the reliability of scientific research. According to him the first thing we should do (and perhaps the most important thing we should do) is to acknowledge and take ownership of the problem. The second thing is that each researcher has a responsibility to make their own work “as reliable as possible within the limits imposed by resources and other constratints.”
The second point is what caught my attention the most. The letter goes on to explain that some published work is the result of one successful experiment out of ten, and that reviewers have to address “clear flaws and inadequate information” to improve reliability. However, bias and reproducibility are not dichotomies in and of themselves. Dan Ariely talked a lot about cheating and the moral code behind cheating, but in the end he briefly mentioned intuition. He mentions that many of our intuitions are wrong and that it should be our responsibility to test these intuitions.
This is where I think Simpson’s Paradox comes into play. In brief, Simpson’s paradox occurs when a trend appears in different groups of data, however when the groups are combined this trend disappears of even reverses. The Ted-Ed video by Mark Lidell mentions that one study in the UK (which I looked for but could not find) appeared to show that smokers had a higher survival rate than non-smokers over a 20 year time period. Yet, when the participants were divided into age groups, the non-smokers were on average much older and thus more likely to die during the trial. This inherently suggested that non-smokers were actually living longer, but the grouping of the data showed a different trend. This raises a lot of questions such as, which types of bias should be aware of when analyzing statistics? What types of motivations could be at play when individuals or companies present statistics? And are there any ethical responsibilities that we face when working with statistics? 


  1. Good stuff. The hardest part of doing science is seeing our biases. Always has been, always will be.

  2. Good stuff, Luis, thanks for sharing! I find it fascinating that this paradox creeps up frequently in daily life. It proves to be difficult to control for confounding factors that are causally relevant to the situation at hand. A real-life example of Simpson's paradox is one that millions of Americans have fallen prey to. Since 2000, the median US wage has risen about 1% however during the same period, the median wage for high school dropouts, high school graduates with no college education, people with some college education, and people with Bachelor's or higher degrees have all decreased. How can both be true? Well, it all depends on the perception of events depending on one's viewpoint. An economist might say the headline rate of overall median wages has increased, but an average individual American will confidently say wages have declined. Ultimately, the paradox is prevalent throughout our daily encounters, which further exposes the inherent challenges in statistics.