Monday, April 11, 2016

It's not you, it's P?

The challenge in interpreting statistical vs. clinical significance has come to the fore front of clinical and basic research alike. Deeply embedded in this conversation is the reliability of p-values and statistical significance as an exhaustive measurement for testing a result's validity. Interestingly, a recent statement released by the American Statistical Association has strongly suggested the growing need to steer science into a "post p<.05 era". The ASA's executive director compellingly argues that "the p-value was never intended to be a substitute for scientific reasoning" and that "Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold." However, I personally believe that while this reasoning has a sound basis, it seems just a tad hypocritical that he makes such a large, sweeping generalization about the applicability of p-values while one of the reasons why p-values are viewed with scrutiny is because of their widespread-usage with out serious consideration for the specific scientific application. It's naive to think that the inclusion of p-values hasn't had a net-positive effect on the scientific community (especially in the infancy of modern research) by forcing one to evaluate the significance of their data, however arbitrary that line may have become in some fields today or may inherently be in certain scientific fields. Given this separate consideration, I think the ASA's statement would do better to reach out to to major scientific organizations and field-specific research leaders and to work with these parties to critically assess how to move-forward into this post p<.05 era but to do so with regard for creating sound statistical parameters on a field-by-field basis. Of course, this is a demanding request but if indeed statistical tests require the complete context of their application then shouldn't each scientific field work together to comprehensively establish not just some arbitrary set of standards e.g. .05 but a set of standards that accommodates the needs of that field. Perhaps, national societies and organizations would do well to utilize their national conferences as vehicle for instigating this shift in paradigm. As the ASA statement importantly highlights that the p-value was never meant to replace sound scientific reasoning it would seem equally important at this critical juncture to rely on the scientific reasoning of statisticians and field leaders to set a new precedent for evaluating data.


  1. I agree with you when you call for a field-by-field evaluation of statistical parameters, but I'm always a little apprehensive when approaching broad seeping generalizations about statistics. You're absolutely correct in your assertion that, in the early developmental stages of scientific publications, the p-value helped shape and define the statistical interpretation of scientific data in a positive manner. However, the problems with the p-value are current issues, and while we should acknowledge the beneficial impact it once had, a critical analysis of how to correct the problems p-values hold over scientific analysis is still needed.
    A paper in Nature News published last year by Chris Woolston "Psychology journal bans P values" really highlights the potential problems/benefits this field-by-field review could produce. In summation, the journal Basic and Applied Social Psychology (BASP) has chosen to announced they would no longer publish any papers with p-values because the statistics were too often used to support lower-quality research". The responses to this were highly varied. Some said it was a novel approach to reshaping publication relevance in the field, while others said that eliminating the p-value (and other forms of null hypothesis significance testing) was "throwing the baby out with the p-value".
    In my blog post, I discussed another Nature News article that discusses raising the statistically relevant threshold of the p-value to 0.005 in order to provide more reproducible research.
    Overall, I think the p-value is definitely here to stay. You can't discredit NHST and it's relevance to scientific interpretation, but suffice to say that I'm definitely now much more wary of any statistical p-value I come across.

  2. I think the ASA's recommendations come partially from the issue of p-hacking. The motivations behind p-hacking is often due to the pressure to succeed. These days, it seems that people often don't have much confidence in their data and like to rely on the p-value to buoy them. It is just easier for someone to replace scientific reasoning with a significant p-value. I think sometimes we lose sight of why we are doing what we are doing due to the pressures to publish etc. and this makes it all that much easier to just solve the problem with a p-value. Personally, I am also in favor of finding alternates to p-value statistics, like confidence intervals.