Monday, April 11, 2016

Testing your outlier; turtles all the way down

If you have been doing bench work for any length of time, you have had an experiment that had seemingly beautiful data that easily passes the bloody obvious test but still is not significant. You begin to dig through the individual data points and you find it, that one mouse/well/prep that is wildly off from the others. That little *&\$@er. Being a good scientist you don’t want to throw away data, your lab notebook says that you did everything correctly that day, and your controls look good. It is not beyond the pale that the value actually happened and was recorded correctly, biological systems are messy and will spit up on you from time to time.
But you really don’t have time/money to repeat it, so you begin the squicky task of seeing if you can justify excluding that value. These are the tests before the test, and could probably stand to be done before all analyses to make sure they conform to your assumptions rather than as a post-hoc measure when something goes wrong. So you begin with a simple Q test, the easiest way to justify an outlier’s removal. So you divide the gap by the range and find that value on the table of Q values. But here you have another set of choices to make depending on your sample size and how sure you want to be of the values outlier status. Do you accept a 90% confidence interval on outlier identification? Or are you more stringent, going for 99%?  Perhaps somewhere in between? Perhaps you just really need this post-doc to be over and consider bumping the range below 90%.
Confused, you go to find more options and find a plethora of other outlier tests; Pierce’s criterion, Chauvenet’s, and you panic, realizing that many outlier tests have their own assumptions about the normality and variance of your data. What if you have a system where the variance is expected to go up as the dose does? Worse, how would you even know that your data is actually normal?  Well there are many tests for the latter, each with their own assumptions and methods. You can do it graphically with a qq plot, which may make it easier to explain to your advisor, or you can do it by either frequentist or Bayesian methods, but almost inevitably you will find that there are assumptions underlying each of those, and again you can search for a test to prove your data does or does not fit them. One errant point has consumed your work day learning the nuances of each statistical test to determine only if you could throw it away, nevermind testing your actual question. You sit staring at the fractal decision flowchart in front of you, little lines trailing off into nothingness. All due to that little *&\$@er.