Many professional sports have experienced a statistical renaissance over the past 30 years. The advanced analytics movement has been made famous by movies like Moneyball and the popularity of fantasy sports. It even gets a fair amount of academic attention at events like the MIT Sloan Sports Analytics Conference.
However, the secret underlying this statistical revolution is that most of the analytics aren’t all that advanced. Most sports have cultures of tradition (and superstition) that have led to some poor statistical reasoning becoming an ingrained within each game. The analytics movement is a true introduction to statistics for a group that has been suffering from making false assumptions.
A good example comes from football. The best defenses in the NFL tend to cause a lot of turnovers (stealing the ball away from the other team’s offense by fumble or interception). Focusing on fumbles, defenses that recover more loose footballs tend to do better overall (fewer yards allowed, fewer points allowed, etc.) each season. It can even be the difference between winning and losing a couple games, which is a big deal with football’s short length of season.
In the past, teams and fans have treated a defense’s ability to recover fumbles as a skill and teams that caused lots of fumbles would be predicted to have a great season. But footballs are oddly shaped. When one hits the ground during a game, the way it will bounce is unpredictable. Analyses of fumble recovery rates throughout history show that the probability that the defense will recover a fumble is about 50%. So historically, a fumbled football can be treated like a coin toss. Over the course of a 16 game season, this creates plenty of opportunities for teams to get lucky with the way the ball is rolling, recover more fumbles and have great season. And for the teams that are unlucky, they get to watch the Super Bowl from home like the rest of us.
Nowadays, knowledge of this phenomenon in football can actually be used to predict which teams will do WORSE in a given year. This is an example of regression to the mean: the more extreme a variable is upon its first measurement, the more likely it is to be closer to the average the second time it is measured. The NFL season is only 16 games long, which gives a pretty small window for collecting sample data. Within this window teams are likely to benefit from fumble recover numbers that land well above the NFL average, simply because of luck. The next season those numbers are likely to regress back to the mean, since fumble recoveries are essentially random over time. So teams that benefitted from an extra win or two due to fumble luck in 2015, might not experience the same bump in performance in 2016.