Wednesday, April 6, 2016

Jelly beans and statistical significance

Statistical significance is present in the everyday life of most people, such as weather predictions, political campaigns, medical studies, quality testing,  insurance and the stock market. Most of the people in the world use statistics unconsciously by noticing patterns in daily circumstances and drawing conclusion based on those patterns. On a greater scale, researchers use statistics to represent their data in a meaningful way.
But what does “significant” means?
If you would open a dictionary you would find the definitions “important” or “meaningful”, but saying that research results are significant, doesn’t mean that they are important. Indeed, a statistical significant result means that two the difference seen between two groups is real and not given by chance. In other words, the falsification of the null hypothesis will occur by chance only under a certain percentage that appears to be set at 5%.
It is still unclear where the origin of the 5% threshold lies, but the most reliable source can be found in the discussion published by Fisher in 1926 on the theoretical basis of the experimental design.1
The real question is, what does this p-value tell us in terms of significance in research?
When conducting studies, researchers should keep in mind three main points:
1.     The dichotomization of p-values into “significant” and “non-significant” leads to a loss of important informations. Two values might be significant, but that doesn’t imply that they are the same.
2.    Statistical significance is not directly linked to clinical significance. As statistical tests are influenced by the sample size, a significant study does not always mean that the outcome is clinically meaningful. A large study might be significant and not be clinically relevant, while a small study can be important as outcome, but not statistically significant.
3.    Although it is tempting to rely only on p-values, the weight that researchers give to them should not be overemphasized. The most important question should remain on the qualitative level of the study, such as design, sample type, patients and bias.

Nowadays, we are overwhelmed by advertising for weight loss pills, miraculous anti-wrinkles creams and any other kind of aesthetic treatment stating that you will get significant results based on data collected in clinical trials. What they clearly forget to mention, it’s what they truly mean by “significant”.

1. Fisher RA, The arrangement of field experiments,

   J. Ministry Agric.,1926, 33:503-513


  1. The ability for scientists to create significance out of nothing significant is a universal problem within the field! A habit has been formed among scientists that obtaining significant results equates to doing significant research. Therefore, the pressure to find something significant, all the while neglecting the quality of the research design, sample type and size, is leading to a "significant" era of science that is not significant at all. We must remember that the p-value is defined by the probability of error-- and more experiments, more stringent experimental design, and larger sample size are necessary to answer the question about what is truly significant! Great article, Camilla!

  2. I agree with this. It was almost amazing that when doing the bad stats assignment, it was almost too easy to find stats errors in every paper that I encountered (disclaimer, I do not know if there is stat. signifcant proof that every paper harbors stats errors). It is almost scary to me that published results are out there for the public without good science and good stats. Like you were saying, setting an arbituary p-value is an exercise we should stop practicing. But how do we fix this problem? Should we remove the word stat. significant? But it does scare me that some of these papers lead to clinical trials that fail based on bad stats from a pre-clinical data. Great insight and great article!