Monday, April 11, 2016

Counter-intuitive apophenia and common statistical problems

Introducing Statistics and Confidence Intervals

An interesting idea that I wanted to share was the concept of “Apophenia”, or the problem where humans find patterns and meaning where there is none. This idea got me thinking, and I went back and flipped through our textbook, and noticed several instances in the first section where the author discusses how humans tend toward counter-intuitive decisions. For example, Motulsky says, “Our brains have simply not evolved to deal sensibly with probability, and most people make the illogical choice.” When it comes to scientific research, the most obvious illogical choices made always seem to tie to the statistics reported to support a publishing’s claim. The two most common culprits with this problem are the P-value and confidence intervals (or more often, the lack thereof), and these are two basic concepts that were introduced early in the textbook.
But why are these two statistics problematic?
The more I researched about P-values, confidence intervals, and the statistical pitfalls of scientific research, the more I felt I was reading the same words and phrases. Everyone agrees unanimously and rallies behind this banner with the battle-cry for reproducible research, and for more stringent statistical reporting! But it was rare to find a proposal of any actual mechanism to correct the problems.
One article from Erika Check Hayden in NatureNews explored an interesting proposal from statistician Valen Johnson. Johnson developed a method to directly compare, “the P value in the frequentist paradigm, and the Bayes factor in the Bayesian paradigm”, and then re-examine select published data to see how the two juxtaposed. His results found that the common standard of p≤ 0.05 coordinates with a weak Bayesian factor, and that of the published data reviewed, “as many as 17–25% of such findings are probably false”. To ameliorate this problem, Johnson suggests, “to use more stringent P values of 0.005 or less to support their findings, and thinks that the use of the 0.05 standard might account for most of the problem of non-reproducibility in science — even more than other issues, such as biases and scientific misconduct”.
Another article I found from Nature News that was published in Medicine by Jean Baptiste-du-Prel et al. discusses when p-values and confidence intervals are most important individually, and when it’s important they’re reported combined. For example, one major point that is discussed is that, in clinical research, the over-stressed p-value is a problem most reports make, and Baptiste-du-Prel even states, “the investigator should be more interested in the size of the difference in therapeutic effect between two treatment groups […] rather than whether the result is statistically significant or not”. The take home message from this article was that both p-values and confidence intervals are intertwined statistics should be reported together to bolster credibility.
While these and other articles I found in my research were very interesting, I can’t help but think about how these more stringent statistical standards will be achieved, and what sort of impact raising the bar will have on new scientists just beginning. While I, personally, am excited at the idea of knowing all the papers I read genuinely are statistically significant, I can’t help but wander back to the problem that got me researching to begin with.

How much of the statistics published in scientific literature is apophenia that stems from lack of basic statistical knowledge?
In a “publish or perish” scientific arena, I can certainly understand why so many statistical mistakes can slip through the cracks, but that doesn’t really change the fact of why there is this statistical apophenia at all. Perhaps if instead of raising the bar for statistical stringency in scientific publishing, we should raise the standards for statistic education for scientists.  Then the problems of abusing the p-value, misinterpreting significance, and making unnecessary statistical decisions illogically, would correct themselves.


  1. It's an interesting idea to use a more stringent P value, but I'm not sure it would solve the problem. One of the major problems is deciding whether something is biologically relevant and changing the criteria for statistical significance may not address this issue.

  2. I'm not particularly convinced that using a more stringent P-value would be helpful. Scientists could still get around this by hacking their data and statistical tests to reach a p-value less than 0.005 or increase sample size. A p-value threshold less than 0.005 might unduly burden some animal research as well, since a large number of samples would likely be required to reach statistical significance even with a large effect size. Setting a p-value too low might obscure scientifically relevant effects, just because some experiments are inherently more variable than others.

    I think the analytical focus should be much more directed to biological significance rather than statistical significance. It might not be possible to do this in a systematic or unbiased way, but I'm not convinced normal statistics reduces bias anyway.