If you have been doing bench work for any length of time,
you have had an experiment that had seemingly beautiful data that easily passes the
bloody obvious test but still is not significant. You begin to dig through the individual data
points and you find it, that one mouse/well/prep that is wildly off from the
others. That little *&$@er. Being a good scientist you don’t want to throw
away data, your lab notebook says that you did everything correctly that day,
and your controls look good. It is not
beyond the pale that the value actually happened and was recorded correctly,
biological systems are messy and will spit up on you from time to time.

But you
really don’t have time/money to repeat it, so you begin the squicky task of
seeing if you can justify excluding that value. These are the tests before the
test, and could probably stand to be done before all analyses to make sure they
conform to your assumptions rather than as a post-hoc measure when something
goes wrong. So you begin with a simple Q test, the easiest way to justify an outlier’s
removal. So you divide the gap by the range and find that value on the table of
Q values. But here you have another set of choices to make depending on your
sample size and how sure you want to be of the values outlier status. Do you
accept a 90% confidence interval on outlier identification? Or are you more
stringent, going for 99%? Perhaps
somewhere in between? Perhaps you just really need this post-doc to be over and
consider bumping the range below 90%.

Confused, you go to find more options and
find a plethora of other outlier tests; Pierce’s criterion, Chauvenet’s, and
you panic, realizing that many outlier tests have their own assumptions about
the normality and variance of your data. What if you have a system where the
variance is expected to go up as the dose does? Worse, how would you even know
that your data is actually normal? Well
there are many tests for the latter, each with their own assumptions and
methods. You can do it graphically with a qq plot, which may make it easier to
explain to your advisor, or you can do it by either frequentist or Bayesian
methods, but almost inevitably you will find that there are assumptions
underlying each of those, and again you can search for a test to prove your
data does or does not fit them. One errant point has consumed your work day
learning the nuances of each statistical test to determine only if you could throw it away, nevermind testing your actual question. You sit staring at the fractal
decision flowchart in front of you, little lines trailing off into nothingness.
All due to that little *&$@er.

This post succinctly sums up what I think is the biggest struggle for young and inexperienced statisticians: which test is appropriate? Perhaps this struggle transcends even veteran statisticians, and therein lies a larger problem: how do we know what to do? There is usually not a statistician that you can talk to and explain the entire experiment, and even if there was, would they know what to do? To get an answer about what kind of test to use, we are left to fumble around with different analyses and of course, our trusty Google search, to figure out the best way to deal with your data. There is the confounding struggle between choosing the "right" test (which you aren't even sure is right) and the test that would allow you to throw away this data point and rest your head on your pillow that night, worry free that you did things correctly, as per the test that you chose and see your desired statistical significance. It's hard to decide on appropriate tests without intimate knowledge of both the experimental system AND the statistical analyses, especially with new or previously unpublished experiments. As such, we try to learn as much as we can and do the best that we can with the knowledge that we have, but is this enough?

ReplyDeleteSo it is outlier tests all the way down! Once you are in a situation where you are torturing your data, it will eventually tell you whatever you want – or as that quote goes. And that is bias. I mean, is it not bias to search through all those tests for such a specific one that will give you the result you are looking for? It’s a very extreme manifestation of bias, indeed. We really need to make sure we design our experiments thoroughly before we begin running them in order to avoid these downward spirals. What Dr. Murphy mentioned in class is golden advice: define your criteria for excluding outliers prior to conducting your experiment. This is not a matter of choosing the right test (or “drug,” as you are attempting to treat “ill” data); rather, it is merely a matter of having studied well the field (i.e. statistics) you seek to utilize in your research. As for that little *&$@er, this is a case where an ounce of prevention is better than a pound of cure. This is especially true when the cure (post hoc outlier test) has such a pathetic therapeutic index. Did you ever think of the implications of biasing your data? What if you’re supposed to get that outlier every time? Maybe only 1% of the time the drug is metabolized to an extent where it is capable of yielding that stark physiological effect. Maybe that one mouse that metabolizes your compound extraordinarily fast is part of a real genetic group of ultra-rapid metabolizers. And if you just convinced yourself to omit that little *&$@er rather that verify whether it is real, then perhaps you missed out on a high-impact publication in a high-profile journal. In summary, if you lack the resources to repeat an experiment, make sure you don’t lack understanding of statistics!

ReplyDeleteI think these are all great arguments to a faer that all scientists have. The fear of having an outlier you don't know what to do with, the fear of making the wrong decision about what to do, and not to mention all the other fears we as graduate students have (will we ever graduate? I hope I don't want to mess up this analysis, etc.). Unfortunately, there is not a good answer for what to do with this frustrating outlier. As mentioned above there are different tests, but there are assumptions that go along with those, and you better know the impact they will have. Dr. Murphy's advice about defining your criteria for excluding outliers prior to the experiment is really the best we can do. Then we just have to make sure we stick with those strict rules you set. Maybe write them down, be specific? Maybe share them with a labmate or friend and see if they would exclude the outlier? There are many different options I could think of, but what should be clear is that there is no written rule. There is good advice, but unfortunately that doesn't always mean we are making the right decision. In some situations it might just be best to show ALL the data, explain the results as best you can, and let the scientific community decide for themselves what significance really means.

ReplyDelete