With the
arrival of spring in Atlanta each year, two things make their inevitable
return: oppressive heat, humidity and pollen and baseball. With the return of
baseball each year comes rampant speculation about who will win the World
Series (this is finally the year for the Cubs) and the beginning of seven
months of continual statistical baseball analysis.

Record of
baseball statistics goes all the way back to the 1800’s where Henry Chadwick
pioneered statistical analysis of baseball, inventing box scores, batting
average and earned run average (ERA) in the process. These statistics remained
elusive to the average baseball fan until 1951 when Hy Turkin published the
first edition of the

*Encyclopedia of Baseball*. This publication drew massive interest in the statistical analysis of America’s pastime and laid the foundation for what is now over 100 years of compiled baseball statistics (which can easily be perused by visiting an online baseball statistics database).
Nowadays,
baseball has evolved from a simple pastime cherished by young and old alike to
an economic powerhouse that drives everything from advertisements, ticket
sales, jersey sales and whopping player contracts exceeding
$250 million in value. Additionally, the value of each of the top 10 most
valuable clubs, as rated by Forbes, now tops $1 billion. With the economic
growth of baseball, the statistical analysis has evolved tremendously. Millions
of dollars and thousands of man-hours each year are now invested in player
evaluation with the hopes of building the perfect team to make a run at the
World Series. This evolution has lead to the creation of completely new schools
of thought regarding baseball statistics. One of the most prominent and widely
used statistical analyses in baseball has become sabermetrics, pioneered by Bill James in the 1980’s.

Sabermetrics
was created with the goal to use the vast amounts of statistical data in
baseball to determine why teams win and lose (a goal that could potentially be
worth billions of dollars if achieved). In an effort to more accurately
describe why teams win and lose, sabermetrics has lead to the creation of a
slew of new statistical categories, a few of which I will describe in the
following paragraphs.

The first sabermetric statistic I
will describe is base runs.
Base runs is a statistical prediction of how many runs a team should have
scored based on their offensive component stats (hits, home runs, walks, etc). The
basic formula is listed below:

In this
equation, A=hits+walks-homeruns, B=(1.4*total
bases-0.6*hits-3*homeruns+0.1*walks)*1.02, C=at bats-hits, and D=homeruns. This
equation has demonstrated tremendous success in predicting scoring in the MLB.
In recent years, it has demonstrated the lowest error of any formula in
predicting runs scored.

In addition
to base runs, sabermetrics includes a defensive statistic called peripheral ERA. This
statistic takes the common statistic ERA and attempts to modify it to factor in
park-adjusted hits, walks, strikeouts and home runs allowed. Just as with base
runs, this statistic relates some of the original statistics in baseball to
create a more accurate model of what occurs within the game. This, in turn,
increases its predictive power.

The final
sabermetric I will discuss is wins
plus hits per innings pitched (WHIP). This statistic is one of the
sabermetrics that has gained mainstream popularity. It attempts to judge a
pitching performance by dividing walks and hits allowed by total innings
pitched. This sabermetric has also demonstrated tremendous success in
predicting the success of pitchers by characterizing their performance by
amount of runners let on base rather than by runs allowed.

Overall,
sabermetrics is beginning to fundamentally change the way players are scouted
and analyzed. More and more managers and executive members of teams are
utilizing sabermetrics with a high level of success. Theo Epstein is one of
those vocal proponents. He has already utilized this strategy to become the
youngest general manager in MLB history with the Boston Red Sox, leading them
to two World Series titles including their first in 86 years. Additionally, he
has used sabermetrics to take the Cubs from basement dwellers of the NL Central
to one of the top teams in baseball.

Over the
past 100+ years, the way that statistics is viewed in baseball has changed
drastically. It has gone from pure curiosity and leisurely analysis to a
multi-billion dollar business. Although baseball, like other sports, is a game
of pure chance, individuals within the MLB are beginning to recognize that
advanced statistics provide information that can potentially lead their teams
to tremendous success (and a lot of money). It will be interesting to see what
the future of statistics in baseball holds and what sorts of advanced statistics
are developed in the coming years.

This comment has been removed by the author.

ReplyDeleteCam, great post on a universally loved past time. It's still slightly gut wrenching to read about Theo outside of the Boston Red Sox, though I was able to look past that. Anyways, the one and only thing that come to mind reading this blog post is the Oakland Athletics and the inception of "Moneyball" or as you stated better known today as Sabermetrics. The use of complex statistical analyses on players history and tendencies leading to the creation of what was heralded as the best all around baseball team ever to take the field. Many other sports franchises even outside the game of baseball have adopted these methods as well. It's truly mind blowing how ubiquitous statistics have become in ever day life.

ReplyDeleteGreat post. I first learned about sabermetrics when it was popularized Michael Lewis in his book "Moneyball." I found it pretty interesting that sabermetrics was pretty much ignored for at least 20 years. With the ability to track statistics on just about everything these days, from number of steps to an individual's h-index, I wonder how much we will rely on statistical analysis of all of these things in the future, and how long it will take for those analyses to gain traction as useful or predictive metrics.

ReplyDeleteVery interesting post! One question I have is how did this equation was derived? Why are there factors attached to the value "B"? How were these factors created? I don't know if we can find the answers quickly but the answer would provide a cool way for me to rig fantasy football, march madness brackets or bets. How about we work on some algorithm or stats to predict the success of players or teams that are more accurate than the ones already known? I am glad that you have reminded me how quantitative and logical sports can be.

ReplyDeleteBilly Beane's sabermetrics approach is a fascinating reminder of the power of sound statistical interpretation in a real-life application. I became familiar with this field as it became popularized through the rise of the Oakland Athletic's but one thing that has resonated with me is how to keep this statistical venture from turning into a slippery-slope of evaluative techniques? The rise and efficacy of sabermetrics comes in contrast to the long-standing refusal by European Soccer Clubs for utilizing exhaustive quantitative measures when evaluating a player. Perhaps this disparity can be attributed to the sports themselves and the propensity of each to be analyzed to such a degree, but it could also be that Billy Beane found the "sweet-spot" for his statistical inferences in baseball. Either way, it's a great reminder of how what we want to know and understand may be sitting right in front of us, we just need the right "lens" to be able to really ascertain it's meaning.

ReplyDelete