What if the same sets of data could show completely opposite trends depending on the grouping of the data? The goal of this blog is to document my reaction to recent articles that address the bias and irreproducibility in science, but as I was reading on the topic, I came across this video that I thought was worth sharing. One of the articles that caught my attention was Jeremy Berg's letter on ASBMB. The letter outlined what the scientific community should do to enhance the reliability of scientific research. According to him the first thing we should do (and perhaps the most important thing we should do) is to acknowledge and take ownership of the problem. The second thing is that each researcher has a responsibility to make their own work “as reliable as possible within the limits imposed by resources and other constratints.”
The second point is what caught my attention the most. The letter goes on to explain that some published work is the result of one successful experiment out of ten, and that reviewers have to address “clear flaws and inadequate information” to improve reliability. However, bias and reproducibility are not dichotomies in and of themselves. Dan Ariely talked a lot about cheating and the moral code behind cheating, but in the end he briefly mentioned intuition. He mentions that many of our intuitions are wrong and that it should be our responsibility to test these intuitions.
This is where I think Simpson’s Paradox comes into play. In brief, Simpson’s paradox occurs when a trend appears in different groups of data, however when the groups are combined this trend disappears of even reverses. The Ted-Ed video by Mark Lidell mentions that one study in the UK (which I looked for but could not find) appeared to show that smokers had a higher survival rate than non-smokers over a 20 year time period. Yet, when the participants were divided into age groups, the non-smokers were on average much older and thus more likely to die during the trial. This inherently suggested that non-smokers were actually living longer, but the grouping of the data showed a different trend. This raises a lot of questions such as, which types of bias should be aware of when analyzing statistics? What types of motivations could be at play when individuals or companies present statistics? And are there any ethical responsibilities that we face when working with statistics?