Tuesday, April 12, 2016

Fumbles: Regression to the Mean

Many professional sports have experienced a statistical renaissance over the past 30 years.  The advanced analytics movement has been made famous by movies like Moneyball and the popularity of fantasy sports.  It even gets a fair amount of academic attention at events like the MIT Sloan Sports Analytics Conference.

However, the secret underlying this statistical revolution is that most of the analytics aren’t all that advanced.  Most sports have cultures of tradition (and superstition) that have led to some poor statistical reasoning becoming an ingrained within each game.  The analytics movement is a true introduction to statistics for a group that has been suffering from making false assumptions.

A good example comes from football.  The best defenses in the NFL tend to cause a lot of turnovers (stealing the ball away from the other team’s offense by fumble or interception).  Focusing on fumbles, defenses that recover more loose footballs tend to do better overall (fewer yards allowed, fewer points allowed, etc.) each season.  It can even be the difference between winning and losing a couple games, which is a big deal with football’s short length of season.

In the past, teams and fans have treated a defense’s ability to recover fumbles as a skill and teams that caused lots of fumbles would be predicted to have a great season.  But footballs are oddly shaped.  When one hits the ground during a game, the way it will bounce is unpredictable.  Analyses of fumble recovery rates throughout history show that the probability that the defense will recover a fumble is about 50%.  So historically, a fumbled football can be treated like a coin toss.  Over the course of a 16 game season, this creates plenty of opportunities for teams to get lucky with the way the ball is rolling, recover more fumbles and have great season.  And for the teams that are unlucky, they get to watch the Super Bowl from home like the rest of us.

Nowadays, knowledge of this phenomenon in football can actually be used to predict which teams will do WORSE in a given year.  This is an example of regression to the mean: the more extreme a variable is upon its first measurement, the more likely it is to be closer to the average the second time it is measured. The NFL season is only 16 games long, which gives a pretty small window for collecting sample data.  Within this window teams are likely to benefit from fumble recover numbers that land well above the NFL average, simply because of luck.  The next season those numbers are likely to regress back to the mean, since fumble recoveries are essentially random over time.  So teams that benefitted from an extra win or two due to fumble luck in 2015, might not experience the same bump in performance in 2016.  


  1. I think there are two things (at least) going on here that contribute to fumble recover numbers: 1. Recovery percentage, and 2. Total number of fumbles. Recovery percentage, as you point out, can logically be thought of as a random variable. A team's defense (or offense) may get lucky one season and have a recovery percentage well above 50. Historically, as you point out, we expect the recovery percentage to regress to about 50. So having a recovery percentage well above that, which contributes to a winning season, can be attributed largely to luck. However, total number of fumbles is something that teams have some control over. So if a team has an offense that fumbles infrequently and a defense that causes fumbles very often, that team’s take away/give away ratio (based on fumbles) will be desirable and may contribute to winning. Thus, an important component of the take away/give away ratio is not solely attributable to luck. So I am not sure I agree with the statement, “Within this window (16 game-long season) teams are likely to benefit from fumble recover numbers that land well above the NFL average, simply because of luck.” No doubt luck plays a role, but coaches would be foolish to depend on offensive players, technique, or plays that produce lots of fumbles. Similarly, defensive coaches would be foolish to not stress and teach fumble-producing technique.

  2. I agree with Vincent. There is certainly the random aspect of fumble recovery that one would expect to regress to the mean but the number of fumbled created by a defense or fumbles experienced by an offense is not random. The latter speaks highly to the style of play and individual skills of players. Still if you were to log a defense's fumbles created over many years and come up with an average, after a year significantly better than that average those numbers will clearly be expected to regress to the mean. However given that this average can be said to based on individual skill level and not random chance, I think this approach assumes that the player roster and physical condition of each player is a static unchanging variable. Something we all know is not true of the NFL. As players and coaches change, the intrinsic capability of the team to create fumbles also changes thus altering the average and variance of this metric in nonrandom ways. So I'm not sure that aggregate year to year means would make a great predictive tool in this context.