As scientists we like numbers, and simple yet clear tests
that provide us insight into a biological process. This is also true when it
comes to interrupting the statistical significance of our data (the numbers we
produced from our simple yet clear tests). For this reason, many scientific
papers rely on the p-value as a “clear” way to say, “look our data is
interesting!, now lets move on”. Harvey Motulsky writes “P values and
conclusions about statistical significance can be useful, but there is more to
statistics the P values and asterisks.” Instead, he suggests a focus on effect
size because that will tell you if your statistically significant results is scientifically
or clinically impactful. Additionally, sample size can have huge impacts on the
P value, that many reading scientific papers do not take into account. It is
important to remember, larger sample size can make a result appear more
significant even if the mean and standard deviation are the same as a data set
with a small sample size and a larger P value.

Recently, the arguments against P values have started to leak
into the scientific community as the journal

*Basic and Applied Psychology*banned the use of P values in their manuscripts. This is a great start to address the flaws of a simple P value assessment of significance. However, it does not get to the deeper issue. It is great that they have banned the use of P values, but what we scientists need is a deeper understanding of why. As this comment published in*Nature*argues, the P value is just one aspect of the data pipeline that we are messing up. In reality, decisions about experimental design, randomization, sample sizes, and types of statistical tests have a huge impact on the results of the experiment. This is why it is so important to think entirely through your experiment and the way you analyze it before you start. Although I do not believe that simply banning the use of P values will ultimately be the fix to all our challenges in statistics, but it does bring awareness to all scientist that there needs to be a shift in how we approach significance. Perhaps to change our thinking from “is there a difference?” to “how big is the difference and will it be impactful?".
I agree with you. The BASP article from Nature News was really interesting, but mostly due to the varied responses people had to it. Some graduate students were left despairing, but other saw it as an exciting challenge. But I also think that we as a scientific community need to focus not just is there a difference or is there an impact, but also what is the scientific significance.

ReplyDeleteThere’s nothing worse for me than reading paper that starts strong, and is decently convincing until a small detail catches your eye, and then the whole sweater unravels in an instant. You start to break down the design and the analysis of the experiments, and then you question their statistical analysis, and then the paper is just pushed to the edge of your desk in disgrace.

But I agree with a lot of what you say, and I’ve mentioned in my post and in other comments similar thoughts. I think completely eliminating the P-value from statistical analysis is a step too far. Removing it may also be too difficult of a change for some scientists, not to mention it’s outright banning completely ignores all of it actually tells us. I think the shift made in regards to the p-value needs to be more mental than physical. If we start actually understanding what we’re reporting, design experiments better, and have a better comprehensive analysis of our findings then we would probably stop using the p-value as a crutch.

I feel like the following statement released from the American Statistical Association may be interesting to you: http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108 (Sorry, I can't edit it to show as a link)

ReplyDeleteIt addresses head on the problem of P<0.05 being taught as a universal law, specifically, that "we teach it because it's what we do, and we do it because it's what we teach"

It should also be noted that this is the first time the ASA has come out specifically on matters of statistical practice, SO IT'S A PRETTY BIG DEAL. And while it may have seemed easy for a group of statistics nerds who all basically share the same opinion on the P value being dangerous, and being egregiously misused, it took from October 2015- February 2016 for them to draft this statement.

The statement is long, but it's worth a skim. At its most basic, it outlines the use and misuse of the P value, with basic language and definitive statements like this,

"Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or

about the probability that random chance produced the observed data. The p-value is neither. It is

a statement about data in relation to a specified hypothetical explanation, and is not a statement

about the explanation itself."

It definitely agrees with what you have to say about the importance of designing your experiment and your analysis before starting.

I feel like the following statement released from the American Statistical Association may be interesting to you: http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108 (Sorry, I can't edit it to show as a link)

ReplyDeleteIt addresses head on the problem of P<0.05 being taught as a universal law, specifically, that "we teach it because it's what we do, and we do it because it's what we teach"

It should also be noted that this is the first time the ASA has come out specifically on matters of statistical practice, SO IT'S A PRETTY BIG DEAL. And while it may have seemed easy for a group of statistics nerds who all basically share the same opinion on the P value being dangerous, and being egregiously misused, it took from October 2015- February 2016 for them to draft this statement.

The statement is long, but it's worth a skim. At its most basic, it outlines the use and misuse of the P value, with basic language and definitive statements like this,

"Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or

about the probability that random chance produced the observed data. The p-value is neither. It is

a statement about data in relation to a specified hypothetical explanation, and is not a statement

about the explanation itself."

It definitely agrees with what you have to say about the importance of designing your experiment and your analysis before starting.

I think your comments about the need for a more thorough understanding of statistical analysis and experimental design are spot on, Jacob, and it seems like we all agree that the p-value is overused but not entirely worthless. I’d like to throw out one other possible justification for the p-value that comes to mind for me. As scientists, we have a greater sense than the average person of just how intricate the systems are that operate in this world, and we recognize that even subtle changes can ultimately have big consequences. At least, we can’t rule out that possibility. Plus, many of us are interested in understanding everything, even relatively trivial details, especially when we have to work so hard experimentally to get those answers. We also all know that we’re riding an upward trend in the amount of data required to make a paper, the number of students in research, the number of papers required to graduate, the number of publications expected for postdocs, junior faculty, senior faculty, etc. Given all this, it can be argued that there’s reason to study and try to publish just about anything, no matter how inconsequential the results may actually be. While the p-value is far from perfect, it at least gives a very clear cutoff that even the newest science students can understand. If your data don’t at least meet that conventionally-accepted threshold, it forces you to go back and reevaluate your hypothesis, your experimental design, etc., make changes if possible, and most of us won’t try to publish that result if we have a choice. So essentially, the p-value may in a small way help to focus the energies of scientists on more substantial effects rather than allowing us to mire ourselves in the sea of slight modifications present in everything.

ReplyDelete