Saturday, April 9, 2016

I have confidence in confidence (intervals) alone!

Plain and simple, confidence intervals allow us to express the accuracy of our data. They're suitable for almost any type of biological measurement as you need three basic things: the sample mean, the standard deviation, and the sample size. Once you determine the degree of confidence you find acceptable for the experiment, you can use those three values to calculate a confidence interval (CI). 95% confidence is standard, though some situations call for a much more narrow CI, such as astrophysics or nuclear weapons. If you choose a conventional CI of 95%, then you are stating the range of values within which there is a 95% chance that the true population value lies.

The example of weather brought up in the Garfield comic is an easy one to think about in this context. If you want to state a high confidence interval, then you must either have many measurements, a small standard deviation in your measurements, or being willing to accept a very large range of values as your interval. Sure you can say confidently that the temperature in Atlanta, GA is between -40°F and 200°F at any given time, because it’s not meteorologically relevant to state otherwise. Say, instead, that you measure the temperature once an hour for 24 hours, so your sample size equals 24. Perhaps your sample mean is 72°F with a standard deviation of ±7°F. Even though your calculated sample mean is absolutely 72°F, you didn’t measure the temperature continuously all day long, so there’s a very real chance that your mean is an inaccurate representation of the actual fluctuations that occurred, particularly if the temperature quickly rose and then dropped again or vice versa. So you can calculate that you are 95% confident the true temperature mean for the past 24 hours is between 69.2°F and 74.8°F.

You must make many assumption about your data in order to calculate a confidence interval correctly. Namely, you assume that your sample is a random representation of the population, that your data are independent and unbiased, and that your data was accurately obtained. In the scenario I outlined, a faulty thermometer that measures 5 degrees warmer than the actual temperature would ruin the accuracy of your CI. The reality is biology isn’t always as easy to interpret as the temperature scenario I outlined, or to understand when there are problems with the data. What if your project is so novel that you don’t know saying the value could be between -40 and 200 is ludicrous? Maybe the value should be 60 and 80, but the time frame in which you’re capturing your data is so large that unknown biology is skewing your results and causing a large standard deviation?

95% confident sounds like such a sure thing, but it’s important to think critically about what these values are telling you. Is it actually biologically relevant?


  1. You give a great example to understand confidence intervals. I think you make a great point that it is important to understand what the numbers mean in order to determine if the confidence interval is giving biologically relevant information.

  2. I think the example in the comic is really helpful for understanding what confidence intervals actually mean. For some reason I always have trouble remembering that the higher the percentage in a confidence interval, the wider the interval has to be. The ridiculously wide confidence interval in the comic does a good job of illustrating this. The percentage for a confidence interval between -40 and 200 degrees is probably very close to 100%, which is why this interval has to be so wide.

  3. My PI always says, "You should be able to look at the data and see the difference." Sometimes I think that I should put confidence intervals on all of my data and end my statistical analyses there. Then when I shown my PI the data, she can truly "just look at the data" and tell if there is a "statistical" difference or not.