Unbiased Research: outliers

Showing posts with label outliers. Show all posts

Wednesday, May 4, 2016

Welcome to Trump Tower....... Umm I mean the White House?

This is a counter point to the recent blog post discussing the chance of Bernie Sanders winning the White House. In this article, a statistician predicts that Donald Trump will win the election in November. Similar to the blog below, this person has never been wrong. Well, it finally happened everyone. All of the other potential candidates dropped out and now the Trumpster (hybrid of Trump and dumpster) himself is the nominee for the Republican party. All of the odds stacked against him including not really being qualified for the position in the first place.

I can't remember a time where I have hoped for a statistic to be wrong. This prediction was made a few months ago and gives Trump a 100% chance of winning. If I would have read this prediction before today I would have said no way is this possible. But maybe it isnt real. Maybe this is simply a year where the statistics got lucky and picked the outlier. There are a lot of things that I doubt the model can take into consideration. Such as, the other horrible candidates fighting for the Republican nomination and the general populations general IQ. So with that being said. Is there a possibility that Bernie could win it all as an Independent? Crazier things have happened. As a matter of fact that thing happened today.

Tuesday, April 12, 2016

Considering Outliers Across Tests of the Same Parameter (i.e. Mouse Lab Problems)

http://www.zazzle.com/statistics_outlier_mouse_pad-144337195953231768

One of the recurring themes of this course is that we should never throw away or disregard data. (Albeit, so long as it was obtained under sound circumstances. If your definition of data for RT-PCR is making up numbers for Ct values those should probably go out the window.) We fought hard to get that data, and it tells an important story. While there is always a degree of expected variance in the data, sometimes it goes above and beyond, and we have perceived statistical outliers.

https://www.pinterest.com/pin/346917977519814108/

While my project focuses on the use of human and mouse cell culture, my lab as a whole utilizes mice as a model system to study voltage-gated sodium channels in various forms of epilepsy. We typically put transgenic mice through a pipeline of sorts, starting with behavioral paradigms and culminating in seizure paradigms. By using seizure paradigms like 6 Hz (i.e. electroshock mice to induce a seizure) or flurothyl (chemically induce a seizure), we can easily determine whether a mouse is seizure resistant or susceptible. The tricky bit is at the beginning, when we do behavioral paradigms to see if they're hyperactive, anxious, or have poor learning and memory compared to a control animal (maybe one that does not carry a sodium channel mutation). Mice inherently are variable in terms of behavior; some are more active than others, some are more lethargic. But this inherent variability makes it really difficult to determine any differences in behavior, because in a single population you have one mouse that is bouncing off walls whereas some just stay in a corner of a box the entire time. We often put mice through multiple paradigms testing the same trait (let's say, anxiety, in this case) to make sure any findings are consistent, and then Grubb the data through GraphPad to pinpoint any outliers.

That being said, what would be the best thing to do if we find an outlier mouse for one anxiety paradigm, but not for the others? Let's say, in each of the following paradigms, Mouse A is:

Paradigm 1: Not an outlier

Paradigm 2: Outlier

Paradigm 3: Not an outlier

Paradigm 4: Not an outlier

There are two options that I can see. The first is to denote Mouse A as an outlier in Paradigm 2 alone, since that's what the Grubb test told us. We would not remove it from our published data sets or from statistical analyses, but we would label it as an outlier on graphs. The second is to mark Mouse A as an outlier in each of the four paradigms, since they all measure anxiety, and it was anxious in one of them. Is that the right thing to do? Is even considering the mouse in these data sets if Grubbs labeled it as an outlier the right thing to do (well, yes, but it's still debated)? One could argue that since these paradigms test anxiety in different settings, the results from one are self-contained, meaning that Mouse A may just not respond well to a particular test, but the other camp also has a valid argument in that it's fishy if Mouse A is "abnormal" in Paradigm 2 but we still consider it as within the sample population for the other paradigms.

Perhaps it's best to just follow the literature and precedent from other mouse papers, but just because something is precedent does not mean it is necessarily correct.

Monday, April 11, 2016

Testing your outlier; turtles all the way down

If you have been doing bench work for any length of time, you have had an experiment that had seemingly beautiful data that easily passes the bloody obvious test but still is not significant. You begin to dig through the individual data points and you find it, that one mouse/well/prep that is wildly off from the others. That little *&$@er. Being a good scientist you don’t want to throw away data, your lab notebook says that you did everything correctly that day, and your controls look good. It is not beyond the pale that the value actually happened and was recorded correctly, biological systems are messy and will spit up on you from time to time.

But you really don’t have time/money to repeat it, so you begin the squicky task of seeing if you can justify excluding that value. These are the tests before the test, and could probably stand to be done before all analyses to make sure they conform to your assumptions rather than as a post-hoc measure when something goes wrong. So you begin with a simple Q test, the easiest way to justify an outlier’s removal. So you divide the gap by the range and find that value on the table of Q values. But here you have another set of choices to make depending on your sample size and how sure you want to be of the values outlier status. Do you accept a 90% confidence interval on outlier identification? Or are you more stringent, going for 99%? Perhaps somewhere in between? Perhaps you just really need this post-doc to be over and consider bumping the range below 90%.

Confused, you go to find more options and find a plethora of other outlier tests; Pierce’s criterion, Chauvenet’s, and you panic, realizing that many outlier tests have their own assumptions about the normality and variance of your data. What if you have a system where the variance is expected to go up as the dose does? Worse, how would you even know that your data is actually normal? Well there are many tests for the latter, each with their own assumptions and methods. You can do it graphically with a qq plot, which may make it easier to explain to your advisor, or you can do it by either frequentist or Bayesian methods, but almost inevitably you will find that there are assumptions underlying each of those, and again you can search for a test to prove your data does or does not fit them. One errant point has consumed your work day learning the nuances of each statistical test to determine only if you could throw it away, nevermind testing your actual question. You sit staring at the fractal decision flowchart in front of you, little lines trailing off into nothingness. All due to that little *&$@er.