Tuesday, April 5, 2016

Statistical Tests: Don't forget to use your brain

Statistics do not make you smart. Common sense is not to be scoffed at. Canonical designs are often outdated and bad. Statistical tests are not sufficient to determine if your data is acceptable.

Our world changes, our science changes, and we get smarter. Why do we allow ourselves to be crippled by the ignorance of the past? I am a structural biologist, I have worked with x-ray crystallography for several years. Crystallography relies heavily on a set of equations and statistical parameters that determine whether or not our data is “good.” We learn these rules as absolutes when we get started - but every year I learn again and again why these rules are idiotic.

The first rule - where do we throw out data? Anything with a signal:noise below two of course! Except.... Our data collection method involves shooting x-rays at an ordered crystal lattice of our protein and observing the scattering of those x-rays when they interact with electron density clouds around the protein. We use the repeated nature of a crystal lattice in combination with the scattering pattern to work backwards and build the electron density. The more reflections we measure, the more we know. Some reflections are weak, some are rare and not often repeated. Throwing out data that doesn’t have a signal:noise above two is still throwing out signal. Why would we throw out our signal?

We have more rules for when to throw out data. There is a test that measures variance within the data set. As you add more data, the variance grows. As our methods improve and we can collect more data, our statistics actually get worse. We get punished for having a stronger crystal that can handle more exposure. We get punished for having a better detector that picks up more signal.

On top of all of this, we have a series of modelling steps that check us as we model. Are we biasing our system? Does the original data still fit? Except this method of checking is itself biased.

Why do we still use these tests? It is because they are written in all the books, they are hammered into us constantly, and we simply do not use our brains. These tests are presented as our sacred way of doing things - but sacred ways tend to be outdated, inappropriate, and written for a different time.

I urge using your brain over trusting the statistics. I bet they are done wrong the majority of the time, and there is no reason to allow yourself to be idiotic and blindly trust in them. Papers use multiple t-tests to compare several groups rather than an ANOVA. We can use a variety of outlier tests on data that looks “wrong” until something says it is an outlier, but were we using the right test? Statistical tests are a tool, but not a rule. They are not the science and they do not determine everything.


  1. Very touching post!! It points out a very important point that is seen more and more often in research (which is sad).
    I think that the it is useful to have a guideline to follow, because it gives a certain consistency overtime and something to start with. On the other hand, I agree with you that it's important that researchers realize that not always following what is given it's the best method. But you know, sometimes people are lazy and using your brain to question something that has been globally accepted requires a lot of energy and commitment. You could argue that this is what research means right? And I would agree with you and I would say that it's kind of scary how people do not use their own judgment when conducting some analysis.
    I think that it's a slow process that will require a better training in statistics starting from an early stage, but I am pretty confident that the more the researchers will acquire statistical knowledge, the more they will start to question the given guidelines.

  2. We've certainly learned this semester not to trust other people's statistics!
    I've always found crystal structures and modeling fascinating because I've probably heard a hundred times "your structure is only as good as your model". But how do we come up with this model? Well by predicting it from the sequence and homologous structures of course! It feels very circular, so I agree with Camilla that it's very tempting to hold on to the security blanket of the familiar rules. Follow these "rules" for analyzing crystal data and you certainly won't get strange looks from your boss or other established people in the field. Use a new and innovative thought process and you're left defending the rigor of your new approach. Which certainly doesn't mean you shouldn't develop improved techniques more fitting with the advances in the field, but coming up with new analysis methods presents its own set of challenges. We need new tools and new rules for moving forward, or run the risk of falling short of our potential advances.
    From our ethics courses I certainly know I can't trust that everyone's good judgement will lead them to more rigorous and robust statistical analysis, but hopefully a few bright statisticians will pioneer the techniques and enable the growing technical fields to adopt them.

  3. Regan, I totally agree with you. We often lose our common sense when confronted with data that is unfamiliar to us. I think this is the reason that the student's t test is used so frequently and so inappropriately. I agree the only solution is better and earlier statistical education, such that we feel comfortable with many different types of data and many different types of analyses. Excellent post and point well taken.

  4. I completely agree with you! It is sad that many times, we follow protocols that have been set in place years ago before new technology came out. However, more than it being because we prefer to not use our brains, I believe many times we do this because we are afraid other people will not use theirs when being critical about our work and not trust our results if we use a different method than the one implemented, which everyone knows and trusts. As with everything else in life, many people are afraid of change because “if the current ways work so well, why change them?” But I believe new methods should always be welcome because, as we discover new things, science changes and our ways of studying and analyzing science has to change with it. With the passing of the years, more and more methods become obsolete, so we need people to speak out when they believe there is a glitch in the system and for the rest of us to have an open mind and consider their propositions because, as I stated above, many times the lack of new analyses come from the untrustworthiness people have toward new points of view.

  5. There's this dichotomy I think we've been dealing with between being told and shown that statistical errors are rampant and many papers misuse their statistics. After all, it only took me looking through 2 papers already on my desk to find my bad stats target! At the same time, somehow science has produced countless advances in medicine and technology.
    I think you're right here, and I've felt the same for a while. It's important to know how to responsibly use statistics so that we don't fall victim to thinking they tell us more than they actually do. Surely many of the seminal advances in science are based on data analysis which wasn't done to the T (or the Mann-Whitney U either!) of immaculate stats. Nonetheless, we should be thankful that these individuals had intuition, smarts, intellect. That's what does the science after all. I always get into arguments with one of my old psychopharm professors who is really into stats and psychology stats specifically. In the end you can justify things mathematically but if they don't have strong and understandable correlates in the real world, what use are they? That's where I think statisticians and scientists come in to play.