## Saturday, April 9, 2016

### I have confidence in confidence (intervals) alone!

Plain and simple, confidence intervals allow us to express the accuracy of our data. They're suitable for almost any type of biological measurement as you need three basic things: the sample mean, the standard deviation, and the sample size. Once you determine the degree of confidence you find acceptable for the experiment, you can use those three values to calculate a confidence interval (CI). 95% confidence is standard, though some situations call for a much more narrow CI, such as astrophysics or nuclear weapons. If you choose a conventional CI of 95%, then you are stating the range of values within which there is a 95% chance that the true population value lies.

The example of weather brought up in the Garfield comic is an easy one to think about in this context. If you want to state a high confidence interval, then you must either have many measurements, a small standard deviation in your measurements, or being willing to accept a very large range of values as your interval. Sure you can say confidently that the temperature in Atlanta, GA is between -40°F and 200°F at any given time, because it’s not meteorologically relevant to state otherwise. Say, instead, that you measure the temperature once an hour for 24 hours, so your sample size equals 24. Perhaps your sample mean is 72°F with a standard deviation of ±7°F. Even though your calculated sample mean is absolutely 72°F, you didn’t measure the temperature continuously all day long, so there’s a very real chance that your mean is an inaccurate representation of the actual fluctuations that occurred, particularly if the temperature quickly rose and then dropped again or vice versa. So you can calculate that you are 95% confident the true temperature mean for the past 24 hours is between 69.2°F and 74.8°F.

You must make many assumption about your data in order to calculate a confidence interval correctly. Namely, you assume that your sample is a random representation of the population, that your data are independent and unbiased, and that your data was accurately obtained. In the scenario I outlined, a faulty thermometer that measures 5 degrees warmer than the actual temperature would ruin the accuracy of your CI. The reality is biology isn’t always as easy to interpret as the temperature scenario I outlined, or to understand when there are problems with the data. What if your project is so novel that you don’t know saying the value could be between -40 and 200 is ludicrous? Maybe the value should be 60 and 80, but the time frame in which you’re capturing your data is so large that unknown biology is skewing your results and causing a large standard deviation?

95% confident sounds like such a sure thing, but it’s important to think critically about what these values are telling you. Is it actually biologically relevant?