Wednesday, April 20, 2016

The Evolution of Statistics in Baseball

With the arrival of spring in Atlanta each year, two things make their inevitable return: oppressive heat, humidity and pollen and baseball. With the return of baseball each year comes rampant speculation about who will win the World Series (this is finally the year for the Cubs) and the beginning of seven months of continual statistical baseball analysis.

Record of baseball statistics goes all the way back to the 1800’s where Henry Chadwick pioneered statistical analysis of baseball, inventing box scores, batting average and earned run average (ERA) in the process. These statistics remained elusive to the average baseball fan until 1951 when Hy Turkin published the first edition of the Encyclopedia of Baseball. This publication drew massive interest in the statistical analysis of America’s pastime and laid the foundation for what is now over 100 years of compiled baseball statistics (which can easily be perused by visiting an online baseball statistics database).

Nowadays, baseball has evolved from a simple pastime cherished by young and old alike to an economic powerhouse that drives everything from advertisements, ticket sales, jersey sales and whopping player contracts exceeding $250 million in value. Additionally, the value of each of the top 10 most valuable clubs, as rated by Forbes, now tops $1 billion. With the economic growth of baseball, the statistical analysis has evolved tremendously. Millions of dollars and thousands of man-hours each year are now invested in player evaluation with the hopes of building the perfect team to make a run at the World Series. This evolution has lead to the creation of completely new schools of thought regarding baseball statistics. One of the most prominent and widely used statistical analyses in baseball has become sabermetrics, pioneered by Bill James in the 1980’s.

Sabermetrics was created with the goal to use the vast amounts of statistical data in baseball to determine why teams win and lose (a goal that could potentially be worth billions of dollars if achieved). In an effort to more accurately describe why teams win and lose, sabermetrics has lead to the creation of a slew of new statistical categories, a few of which I will describe in the following paragraphs.
The first sabermetric statistic I will describe is base runs. Base runs is a statistical prediction of how many runs a team should have scored based on their offensive component stats (hits, home runs, walks, etc). The basic formula is listed below:

In this equation, A=hits+walks-homeruns, B=(1.4*total bases-0.6*hits-3*homeruns+0.1*walks)*1.02, C=at bats-hits, and D=homeruns. This equation has demonstrated tremendous success in predicting scoring in the MLB. In recent years, it has demonstrated the lowest error of any formula in predicting runs scored.

In addition to base runs, sabermetrics includes a defensive statistic called peripheral ERA. This statistic takes the common statistic ERA and attempts to modify it to factor in park-adjusted hits, walks, strikeouts and home runs allowed. Just as with base runs, this statistic relates some of the original statistics in baseball to create a more accurate model of what occurs within the game. This, in turn, increases its predictive power.

The final sabermetric I will discuss is wins plus hits per innings pitched (WHIP). This statistic is one of the sabermetrics that has gained mainstream popularity. It attempts to judge a pitching performance by dividing walks and hits allowed by total innings pitched. This sabermetric has also demonstrated tremendous success in predicting the success of pitchers by characterizing their performance by amount of runners let on base rather than by runs allowed.

Overall, sabermetrics is beginning to fundamentally change the way players are scouted and analyzed. More and more managers and executive members of teams are utilizing sabermetrics with a high level of success. Theo Epstein is one of those vocal proponents. He has already utilized this strategy to become the youngest general manager in MLB history with the Boston Red Sox, leading them to two World Series titles including their first in 86 years. Additionally, he has used sabermetrics to take the Cubs from basement dwellers of the NL Central to one of the top teams in baseball.

Over the past 100+ years, the way that statistics is viewed in baseball has changed drastically. It has gone from pure curiosity and leisurely analysis to a multi-billion dollar business. Although baseball, like other sports, is a game of pure chance, individuals within the MLB are beginning to recognize that advanced statistics provide information that can potentially lead their teams to tremendous success (and a lot of money). It will be interesting to see what the future of statistics in baseball holds and what sorts of advanced statistics are developed in the coming years.


  1. This comment has been removed by the author.

  2. Cam, great post on a universally loved past time. It's still slightly gut wrenching to read about Theo outside of the Boston Red Sox, though I was able to look past that. Anyways, the one and only thing that come to mind reading this blog post is the Oakland Athletics and the inception of "Moneyball" or as you stated better known today as Sabermetrics. The use of complex statistical analyses on players history and tendencies leading to the creation of what was heralded as the best all around baseball team ever to take the field. Many other sports franchises even outside the game of baseball have adopted these methods as well. It's truly mind blowing how ubiquitous statistics have become in ever day life.

  3. Great post. I first learned about sabermetrics when it was popularized Michael Lewis in his book "Moneyball." I found it pretty interesting that sabermetrics was pretty much ignored for at least 20 years. With the ability to track statistics on just about everything these days, from number of steps to an individual's h-index, I wonder how much we will rely on statistical analysis of all of these things in the future, and how long it will take for those analyses to gain traction as useful or predictive metrics.

  4. Very interesting post! One question I have is how did this equation was derived? Why are there factors attached to the value "B"? How were these factors created? I don't know if we can find the answers quickly but the answer would provide a cool way for me to rig fantasy football, march madness brackets or bets. How about we work on some algorithm or stats to predict the success of players or teams that are more accurate than the ones already known? I am glad that you have reminded me how quantitative and logical sports can be.

  5. Billy Beane's sabermetrics approach is a fascinating reminder of the power of sound statistical interpretation in a real-life application. I became familiar with this field as it became popularized through the rise of the Oakland Athletic's but one thing that has resonated with me is how to keep this statistical venture from turning into a slippery-slope of evaluative techniques? The rise and efficacy of sabermetrics comes in contrast to the long-standing refusal by European Soccer Clubs for utilizing exhaustive quantitative measures when evaluating a player. Perhaps this disparity can be attributed to the sports themselves and the propensity of each to be analyzed to such a degree, but it could also be that Billy Beane found the "sweet-spot" for his statistical inferences in baseball. Either way, it's a great reminder of how what we want to know and understand may be sitting right in front of us, we just need the right "lens" to be able to really ascertain it's meaning.