Monday, April 11, 2016

Is statistical testing worth it?

In preparation for this assignment, I read an article titled The controversy of significance testing: misconceptions and alternatives. As the title suggests, the article went into some detail regarding the controversy surrounding significance testing. The main controversy was the idea that the P value is often misinterpreted and that other factors such as confidence intervals and effect sizes are ignored. This point was interesting and reminded me of something from the textbook. The idea that just because a result is statistically significant does not mean it is important. While I still believe proper experimental design and statistical analysis are important, I could identify with this critique regarding the misinterpretation of P values and what they really mean. It is frustrating to think that after all the time planning and executing an experiment with a statistically significant P value, that the result was really irrelevant. The book gave the example of a drug that led to a decrease in symptoms with a statistically significant P value. However, the statistically significant result only decreased symptoms by 7%, which was not enough in the broader scheme of things.  Because my research is related to the effects of a certain compound on cancer cell growth, a statistically significant result that would not really benefit patients isn’t ideal.

            In favor of statistical testing is first the fact that this article was written in 1999 and things have since improved with the reliability of statistically significant outcomes. Furthermore, in order to properly run an unbiased experiment, the design must be planned first, improving the integrity of the science conducted. I think the thought that must go in to properly performing research leads to better execution of science, and if more people prepared correctly, it may improve issues with reproducibility in science. Overall, I believe the benefits of statistical testing outweigh the drawbacks. When done correctly, the outcomes of research are more reliable.


  1. I think p values, despite frequently being misused, can still be a useful statistical parameter. However, I don't think P values alone are very informative. When evaluating a scientific result, we should ask ourselves the following:
    1) How likely was it this result was obtained by chance if the null is true?
    2) How big is the effect? I'm not sure I care if my favorite food increases my chance of getting cancer by 0.000000000000000000000000000001%. It's still valuable scientific knowledge to have, because it may lead us to important biological mechanisms, but how we communicate that result should be different than if it was a 10% increase.
    3) Is the study adequately powered? Is an outlier driving a major result? I don't really care if it's statistically significant if it's underpowered and removing a single outlier data point kills your significance. Design a better study.

    More directly to your point, I think statistical testing is good because even if it can be abused, it does give us a rigorous way of looking at data. Humans are way too good at seeing patterns in noise: Without statistical testing of some kind, we will become more biased, not less.

  2. I totally that statistically significant results can be scientifically irrelevant. And I would say that sometime scientifically important result might not have statistical significance.

    At least I was in this kind of situation once. In bacterial growth inhibition assay of extract treatment, we can regard the data in two ways: percentage inhibition of bacterial growth at a certain concentration, or the extract concentration that reaches 50% percent inhibition. We can use ANOVA or t tests on the percentage inhibition data to show statistical significance in difference. But the extract concentration is the lower the better. But the effectiveness of extracts is hard to evaluate if you are testing multiple strain; it is when you got low concentration for one strain but might have not much percentage inhibition towards another.

    I'm not sure if I explain my trouble well, but I just want to second your point that we have to make the judge of science after statistical test.