## Monday, April 4, 2016

### The cause of feelings of hoplessness and failure in graduate school: P-Values and Statistical Significance

Scientists, especially graduate students, have become too focused and driven on results being statistically significant. We play statistical significance up to be all-important in science; most of our experiments and projects focus on finding some difference. If we don’t get the results that are “statistically significant”, we feel like failures and that something went wrong. Maybe I am generalizing too much of my own experience in graduate school, but bear with me. Graduate school is notoriously viewed (well, at least by me) as “soul-sucking.” I believe that much of these feelings of hopelessness and failure originate from the moment you press “Analyze” on Prism and see “ns.” Imagine how different graduate school would be if that feeling of failure were eliminated…how would things be if we took every negative result and no longer viewed it as a dead end or a reflection of our abilities as scientists? What if when we saw “ns” we could feel joy and not distress? I feel like our success in graduate school is defined by statistical significance; without a p<0.05, our hard work means nothing. When was the last time that any of us went to a thesis defense that focused on non-significant results? Why has it become that a statistically significant result is necessary to earn our doctorate? Would our education be at a disadvantage if were not required to present statistically significant data?
In a way, statistical significance helps to remove bias by allowing for quantification and comparison of results in order to look for a difference. Statistics and calculating a P value are what allow western blots to be informative and unbiased. Without P value, there would likely be variation in what some would say “looks” like a difference between two groups. Science needs statistical significance. However, statistical significance has also created bias in the way that we approach problems. The need for statistical significance prevents us from exploring concepts and hypotheses that may turn up to be of no significance. The need for statistical significance may also lead a researcher (without proper statistical training) to increase the n of their experiment to the point where a p value of <0.05 is inevitable. It has become unacceptable to just say no significance; we force our P value to mean something, even if it’s just “trending” towards significance. Statistical significance and p values both eliminate bias as well as create it.

I feel that people don’t actually think about what “statistically significant” means; all a P value can tell us is the probability that we could see that a result of the same magnitude if the null hypothesis were true. It cannot actually tell us how likely the alternative hypothesis is true. Thus, we need to stop defining the importance of our work by the P value. Motulsky brings up that colloquialisms may contribute this problem of focusing on statically significance a P values. We associate the term significance with importance, which is incorrect when interpreting statistics. In order to interpret statistics, one must understand the theory and definitions of the terms used. Then, and only then, can we understand that statistics does not interpret the importance of our experimental results; it only allows us to accept or reject the null hypothesis.  We can no longer define our work and goals by “statistical significance”; instead we should be seeking scientific importance.

1. I agree that a shift towards focusing on scientific, rather than statistical, significance is needed. The question, however, remains as to how to acheive that shift in the field overall. Certainly, the emergence of "negative data" journals is a move in that direction, but those journals still lack the prestige of traditional journals. In fact, I think separating "negative data" into separate journals is an unfortunate dismisal of the importance of that data. It seems to me that we need traditional journals (and those on our thesis committees!) to accept the importance of negative data.

2. Has statistics always been a part of the scientific method? Einstein produced the theory of relativity in the beginning of the last century, but it took 100 years for that to actually be proven by science. Maybe in the past, before the ease of fancy calculators or software, the field didn't require the checkmark of statistical significance. Perhaps sometimes applying common sense, or a broad understanding of the particular field was enough. Or maybe scientific knowledge has been growing exponentially as we obtain new tools and education, and the only way to keep up with this knowledge to apply rules, such as passing as statistically significant. Maybe instead of shifting back towards scientific significance, the move should be to create better applications and rules for statistical significance, so that we are all analyzing the data in a fair and unanimous method.

1. This comment has been removed by the author.

2. I'm glad you brought up the example of Einstein. It really shows the progression of scientific thinking. Maybe you are right in that we need to embrace the use of statistical significance and think of better applications for it. Instead of complaining and discouraging the use of statistical significance, we should move forward and find ways to better utilize p value.

3. I actually think of statistics as empowering. If the null has been given a fair test through a proper experimental design, and the outcome is "negative", it's a lot easier to face the music. In fact, it's gratifying. It forces you to move on to another problem.

There's a lot less angst when the process is followed, compared to the typical scenario of a never-ending run of preliminary experiments, each with a slightly different flair than the previous. Alone and together they are never powered up enough to give the null a fair test. The doubt always nags.

4. Amen, sistah! Nearly every time I present data on one of my ongoing experiments to faculty, they ask it I have performed statistics on it yet. When I explain that I am waiting until the end, when I have all my data, I am often met with confusion, suggestions to do t-tests on random groups (just to see), or hostility. It is one thing to want to know the results (that's why we do the tests) but it's another to only care about them when they're below a certain p value.

I've also recently gone to several student talks where they present statistically tested and nonsignificant "trends" as if there are real differences between groups. Our culture is one so obsessed with the idea that things must be <0.05 that we pretend like we have it even when we don't.

5. I do think that science as a field is too focused with statistical significance, but I also agree with Julia in that we need some measure of validity or uniformity with our results. You can't just give a talk and say that since there is a slight difference between two columns in your figure that you've cured cancer. It's another thing to say that as you increase dose of a drug you see a general increasing trend in the response variable, even if it isn't found "significant" by a one way ANOVA or the appropriate statistical test.

One thing we also forget is that negative data, or non-significant p values, are still valuable results. It's not as "sexy" as getting that p < 0.05, but even if something is not significant it is a result and tells you more about the phenomenon at hand.

6. Some of those early exclusive discoveries were made by scientists with no statistical or relative training. Especially for findings that are qualitative instead of quantitative, it's impossible sometimes to do the "significance" check. But we couldn't deny the importance of these findings. I would agree that nowadays we focus too much on "statistically significant" data especially when it comes to good publication. I once showed a patient data to my thesis committee which showed a trend of upregulation of my target gene in human patients but with a p-value bigger than 0.05. I thought this was informational though not statistically important but a few commented it was OK to show the data but not good to publish it. Then I would expect others might ask the question: would it be relevant to patients if you only show us the mouse model data?That contribute to some extent of my "feelings of hopelessness or failure". My way of handling it is to use my own judgement. I will make the decision as to whether to show the "ns" data, whether to do statistics in some parts of the research project, all based on the experimental experience and relevant readings, as well as stats in mind. And I think that's also part of we our graduate trainings- to build up our own unbiased way of doing research.