Wednesday, April 6, 2016

The Double-Edged Blade of Occam's Razor

Occam's Razor is the concept that the simplest explanation is the most likely explanation. Unless you are particularly susceptible to magical thinking, you likely employ this principle frequently in your everyday life. It is also, probably, quite useful to you.

Can't find your keys? Odds are a unicorn didn't eat them, you just forgot where you put them. 

End up at the Clermont Lounge instead of the Clairmont Inn? You'll probably want to check your GPS instead of looking for a wormhole. (Or just enjoy the Clermont Lounge... But I digress.)

Occam's Razor is also incredibly important to developing models to explain biological data.  Statisticians often speak of overfitting models (see blog post by Ashley Cross) as a common temptation and problem for scientists. Yes, it may be possible to construct an equation that perfectly explains each and every one of your data points. This strategy not only disregards the inherent variability within biology, it also makes it much more difficult to apply the model to other systems. 

However, the apparent simplicity of making a simple model should also be taken with a grain of salt. Simplicity is easy and easy explanations are comforting. It is much more reassuring to believe that you mistyped into your GPS than to believe you fell into a worm hole. But imagine the experience you'd have disregarded if you actually did fall into a worm hole. 

Models are derived from your sample which a) by chance, may not represent the population you hope to extrapolate the results to, or b) could be impacted by an enormous number of factors that you have no control over, or don't know exist. Good hypotheses are based off of educated predictions and previous knowledge, but absolutely nothing is completely understood.  Biology is complex, and we often take complexity for granted. The people over at LessWrong give several examples of this. Most relevant to this discussion is the example of Thor, the angry god to which ancient people attributed lightning strikes:

"The human brain is the most complex artifact in the known universe.  If anger seems simple, it's because we don't see all the neural circuitry that's implementing the emotion...  The complexity of anger, and indeed the complexity of intelligence, was glossed over by the humans who hypothesized Thor the thunder-agent."

Though most of us probably don't pray to Thor during every rainstorm, we may still be equally likely to oversimplify as to overcomplicate. Models are incredibly useful and important. They save time and energy, but they are descriptions and not explanations. If you have read Part G of Intuitive Biostatistics, it is easy to see that choosing how to construct or compare models can be a tenuous process if not given enough though. It is always important to understand the problem you are trying to address, but we must be careful as scientists to understand we have limited understanding. 


  1. Not to be a Thor-ite, but I would hesitate to say that oversimplification happens to the same extent as overcomplification in a statistical sense. I see the oversimplification as the hand-waving generalities that people (or even scientists) use when they are not referring to actual data, but just want to make a point. An immediate example that comes to mind is the increase in life expectancy we have seen over the past century from roughly 50 to 80 years.
    The vast majority of press focuses on old people getting older - basically how there are more octogenarians today than there were in 1900. However, a huge factor in population-wide longevity is the decrease in infant mortality.
    Conceptually, we think about old people, but every model of life expectancy I have seen has factored in infant mortality appropriately (at least to my eyes). My point is that people do often oversimplify when speaking colloquially, but I do not see statistical analyses being steered similarly.

  2. Whenever I read your blog post, it made me think of this old saying that I have heard some of my mentors involved in diagnostics of infectious diseases say, "When you hear hoofbeats, think of horses not zebras". This statement mirrors the message of Occam's razor, but it is a tad bit more colorful.

    As you clearly point out, I think that the simplest explanation is typically the first explanation that one should go with, especially in science. This translates into it should also be the first idea to rule in or out with experimental evidence. I had this principal reinforced by a mentor of mine recently whenever I presented data that could be explained by two mechanisms. Indeed, one of the mechanisms I proposed was slightly outlandish whereas the other was more reasonable although both were possible. Of course, I favored the outlandish model because if it was true the result would be a high-impact paper. My mentor quickly reminded me how risky and difficult it would be to demonstrate my outlandish model versus the simple model, and she provided guidance that I should start with refuting or validating the simplest explanation first. This made a lot of sense in retrospect, and I am glad that I learned this lesson early and didn't go down a "rabbit hole" that could delay my PhD. I think that as scientist we should always consider the simplest explanation, but if we rule it out, we should be open to the "zebras" that do appear in biological systems from time to time.

  3. Great post, Erica! Hypothesis driven modeling should definitely be the standard but even then it is hard to know when you have the data necessary to tell the difference between a mistake and a wormhole. When Hodgkin and Huxley were modeling the dynamics of the action potential, their models correctly predicted the existence of the correct number of activation and inactivation gates on sodium and potassium channels. This was a huge finding and it was possible because they had collected a wealth of information on the action potential. Which ions were involved, which channels were activated at different points in time, etc. Another set of researchers could have just slapped a polynomial onto the voltage deflections and neuroscientists would have been stuck talking about the rising or falling values of the "A" and "B" coefficients in a 'predictive' model that they didn't understand.