Introducing Statistics and Confidence Intervals
Statistics is, to me, man’s way of recognizing that we are imperfect and doing our best to control for it. We try to reduce bias at every level of experimentation, from study design to statistical analyses, but because this is a man-made technique of reducing man’s impact on the work that we do as scientists, it is only as effective as we are. It is the same as a computer- a computer is only as powerful and smart as the person who is running it. As such, we need to make ourselves as unbiased and as well-educated as possible in order to trust the conclusions that we draw. It is easy (and only human) to overlook many of the possible variables and situations that can cause our data to look a certain way that have nothing to do with the experimental treatment that we wish to test (and many times, that which we think we are successfully testing!).
The problem with statistics is that many times, we think we know more than we do. We are overconfident in our hypotheses and in our conclusions, and we yell on top of the data (with asterisks) instead of letting the data speak for itself. It is not enough to execute a well-designed experiment. It must be interpreted correctly as well in order to make inferences about the world around us, which is the ultimate goal of experimentation. For example, the p value is touted as the “end-all-be-all” of scientific (statistical) significance. If p<0.05, then we conclude that our treatment is working and we should get a Nature paper. However, in many cases, these small p values still beg the question, WHO CARES? If something is statistically significant, it does not mean that it is clinically relevant. Additionally, the scientific community receives (or should receive) a lot of flak for the weight they give to p values, when in fact what we should be reporting most of the time is a confidence interval. The confidence interval is intimately related to the p value, but it gives far more information and is a more accurate and informative description of the data. People do not understand p values and many times, they do not stop to think closely enough about confidence intervals either. Below are two graphs I have selected from a biostatistics lecture by Patrick Breheny illustrating the differences that result from your choice of confidence level and how they are intuitively very simple, if one takes the time to think about them…
Now, one of these graphs shows a 95% confidence interval, and the other shows an 80% confidence interval. If you think about just the values, you would (wrongly) assume that an 80% confidence interval is “worse” than a 95% confidence interval because 80 is less than 95. However, the definition of a confidence interval is that there is a X% chance that your interval contains the population mean. So, in order for you to be more sure that your interval will contain the true population value, you must widen the interval. Therefore, a 95% confidence interval is actually larger than an 80% confidence interval, but you are more confident that it contains the true population mean. Understanding this somewhat simple but very important concept is essential to generate and interpret scientific data. This course has illustrated this concept and the importance of statistics very well and I will make sure to keep this in the back of my mind throughout my career.