Moneyball and the Rise of Advanced Statistics in Sports

Apple | Spotify | Amazon | Player.FM | TuneIn
Castbox | Podurama | Podcast Republic | RSS | Patreon

Podcast Transcript

Ever since organized sports began, people have been collecting statistics. 

These statistics were originally collected to let people know what happened during a game they might have missed. 

However, over time, these statistics became more and more sophisticated, and they eventually began dictating how the games themselves were to be played by uncovering truths that were overlooked.

Learn more about Moneyball and the rise of statistics in sports on this episode of Everything Everywhere Daily.

For those of you not interested in sports, let me assure you that while this might seem like a sports episode on the surface, it is really an episode about mathematics and business. 

And for those of you who live in countries where they don’t play baseball, again, the origins of this story do start with baseball, but they most assuredly do not end there. 

The story begins in 1858 when the game of baseball was still figuring out what the rules of the sport were. One of the fathers of the sport, Henry Chadwick, developed a succinct way to summarize what happened in a baseball game that could be published in newspapers. His system was known as a box score. 

The box score came from the system of recording cricket scores and consisted of two parts. The line score was a two-line summary of the runs scored in each inning and the final tally of runs, hits, and errors.

Below that was a list of what each individual did during the game. For batters, it initially recorded the number of runs, hits, put-outs, assists, and errors each player made. 

Over time the box score changed a little, but not a lot. You could look at Chadwick’s original 1858 box score and figure out what was being reported. Over time, the statistics changed slightly to cover at-bats, runs, hits, runs batted in,  home runs, base on balls, and strikeouts. 

Pitchers get their own stats, which cover what they gave up, hits, runs, earned runs, base-on-balls strikeouts, and home runs.

By the time professional baseball started, a very particular play had developed. The game was based on having a very high batting average and stealing bases to try to get runs. Everyone in baseball just assumed that this was the best way to play the game. 

Fast forward to 1971, when sixteen baseball historians met in Cooperstown, New York, to found the Society for American Baseball Research, known as SABR.

SABR was initially designed to document and record baseball history, and the creation of the organization came soon after the release of The Baseball Encyclopedia in 1969, which provided the career statistics for every professional baseball player who ever played. 

This was not the first encyclopedia of baseball, but it was unique in that the authors literally built the database of statistics up by checking every box score of every game ever played.

It was one of the first books to be written with the aid of a computer that organized all of the data.

Major League Baseball established the Special Baseball Records Committee, whose job it was to go back and clean up old statistics, fix how statistics were interpreted under old rules, and find any gaps in the historical record.

Over time, historical baseball statistics got better, as did contemporary baseball statistics. 

The person who is credited with using baseball statistics for advanced analysis was Bill James. In the 1970s, while working as a night security guard at the Stokely-Van Camp’s pork and beans cannery in Kansas, he began writing articles asking statistical questions which had never really been asked before, like which pitchers and catchers were the best at preventing stolen bases, and how the average age of a team affected performance.

In 1977, he published the first Bill James Baseball Abstract. He dubbed the study of baseball statistics ‘sabermetrics,’ named after the Society of American Baseball Research.

Sabermetrics began to question many of the long-held assumptions about how the game of baseball should be played.

One of the first traditional statistics to be challenged was the batting average. A player’s batting average is determined by dividing the number of hits by the number of at-bats. However, a walk is not considered an at-bat.

What James and others realized was that a walk was as good as a hit insofar as the player didn’t get out. Also, what the player did when they didn’t get out mattered a great deal. Hitting a home run was more valuable than getting a single.

Stolen bases were downplayed unless a player was good enough to successfully steal a base 80% of the time. Pitching wins were also deprecated as they didn’t reflect a pitching performance so much as a team performance. 

New statistics were created which attempted to capture the complete value of a player in a single number. 

As statistics became better and computers became better, it was possible to do even more sophisticated analyses. Still, for years, sabermetrics remained in the realm of enthusiasts.

Baseball traditionalists had no need for whatever fancy algorithms that a bunch of eggheads was producing. Baseball had been around for 100 years and was doing quite fine, thank you very much. 

However, it was only a matter of time before sabermetrics found its way into professional baseball. 

While there were individual cases of players and managers using advanced statistics in the dugout, the person credited with using sabermetrics at an organizational level was Billy Beane.

Beane was a former player who was appointed the general manager of the Oakland Athletics in 1997.  Oakland was considered a small market team. They didn’t have the budget of teams like the New York Yankees or Los Angeles Dodgers. 

Beane needed to figure out a way to assemble a team that was competitive on a budget. His predecessor as General Manager, Sandy Alderson, began using sabermetrics to find undervalued players. 

Beane continued this strategy which resulted in the Athletics making the playoffs from 2000 to 2003. Most famously, the Athletics won 20 games in a row in 2002, which set an American League record. 

This was done despite having one of the lowest payrolls in baseball. This was documented in the 2003 book, “Moneyball: The Art of Winning an Unfair Game”, and was later turned into a movie starring Brad Pitt as Billy Beane.

The cat was now out of the bag, and teams began hiring their own statisticians. 

In 2003, the Boston Red Sox hired none other than Bill James. One of the former readers of his baseball abstract was now the owner of the club.

The next year in 2004, the Red Sox won their first championship in 86 years.

With every team deploying statisticians, undervalued players were no longer undervalued. 

Statistics began to explode when everything started to be tracked. Companies set up systems in every major league ballpark to track the movement of every hit and pitch. 

The speed of every pitch thrown was measured, as was the velocity off the bat and even the angle of the ball after it was hit.

Managers eventually, begrudgingly, began adopting the recommendations of statistics. Pitchers were changed more frequently, defenses would shift players to increase the odds of getting individual batters out. 

Players in the dugout had access to videos of every pitch they or opposing players ever faced. Batting averages went down, home runs went up, strikeouts went up, and games got longer. 

At the top of the show, I mentioned how this wasn’t just a story about baseball, and that’s true. The revolution in sports statistics started with baseball, but that is because baseball was a game uniquely suited to collecting statistics. Everything that happens in a game can be broken down into discrete elements which can be tracked and recorded. 

Other sports are more fluid. Basketball, football, soccer, rugby, and hockey involve constant movement, not discrete actions like baseball. 

Nonetheless, advanced statistics are now working their way into those sports as well. 

Basketball was one of the first. One of the first things that was realized was that game level statistics were meaningless. They often reflected how long someone played, not how well they played. 

Instead, they focused on what players were able to do per minute. Someone who scored 20 points in a game might not have played as well as someone who scored 10 points in a game, if the person with 20 points played three times as many minutes. 

They also looked at the plus/minus for each player. Plus/minus was actually developed in ice hockey, and simply looks at the score differential when a player enters the game and when they sit down. If they enter the game at multiple points, then those point differences are just added up.

Plus/minus is actually very simple. Current efforts in basketball statistics are much more sophisticated. They are following the ball and every pass that is made during an entire game. From that, they can develop maps that treat the baseketball like network, and the individual players like nodes on a network. 

This network analysis proved that Phil Jackson’s triangle offense, with which he won 10 NBA championships, really does work….not that 10 championships didn’t already do that. 

American football doesn’t lend itself as well to advanced statistics simply because most players never touch the ball. However, the data has made a few coaches at lower levels of the sport make radical changes. 

Kevin Kelley, the head coach of Pulaski Academy in Arkansas won 9 championships in 18 years by never punting. Never. Not only did his team never punt, but they also used an onside kick every time.

The data showed him that football wasn’t a game of yards, it was a game of possessions.

Even Hall of Fame coach Don Schula predicted that one day there will be a coach in the NFL who will be brave enough to never punt. 

Association football, aka soccer, is also starting to see statistics take hold, although it hasn’t caught on globally yet. 

In 2015, Ali Curtis was hired as the director of Football operations for the New York Red Bulls in Major League Soccer. He had a very analytic approach to running the team. He immediately fired the team manager, and cut several under performing players. His move angered fans until they won the Supporters’ Shield that year, as the team with the best regular season record.

When Leicester City won their improbable English Premier League championship in 2016 against much better funded teams, they did so largely by finding undervalued players. 

Some of the best European clubs have hired sabermatricians, I guess you could call them soccermatricians, to try to give their teams an edge. 

In 2020, Billy Beane himself, took a 5% ownership state in the Dutch football club AZ Alkmaar. Beane’s association with the club came from Robert Eenhoorn, a Dutch former major league baseball player, who was general director of the club.

Like the Oakland A’s, AZ Alkmaar had a smaller budget than the other clubs in their league. They began using a data driven approach to signing players, and saw a string of success competing at the top of the Eredivisie, the top Dutch football league. 

Given the amount of money involved in leagues like the English Premier League, they almost can’t afford not to adopt advanced statistics. The difference with baseball statistics, is that each team tends to develop their own proprietary models based on their own data, so the information isn’t public. 

There is one sport that I haven’t mentioned yet. The one sport which, like baseball, can be broken down to a series of discrete actions: cricket.

Some people in the world of cricket have started to use advanced statistics to try to debunk myths about the game and to improve decisions made on and off the field.

Cricket statistician Charles Davis has tracked down and recorded the results of every test match played since 1877. He published a book in 2000 called the “Best of the Best” where he took a Bill James approach at analyzing cricket. 

With the rise in popularity of Twenty-20 Cricket and the Indian Premier League, more money is being invested in the game, and it is likely that ball tracking systems like those used in baseball stadiums may be installed in the near future to provide more granular data. 

As with baseball, as more money comes into the game of cricket, more effort will be spent by teams trying to gain an edge over their opponents. 

While advanced statistics can make roster and coaching decisions more efficient, they can make the game too efficient. 

Many people have claimed that the use of statistics has gone so far in baseball that it has made the game boring. Everything has become so optimized that it is no longer exciting. 

In 2023, Major League Baseball took steps to change the rules to speed up the game and remove shifting defensive players. There may be other rule changes ahead as well, including limiting the number of pitchers that can be used in a game. 

The use of statistics to improve decision-making in sports will only grow over time. As datasets become larger and more robust, artificial intelligence programs may be developed to analyze the data to find trends that even humans can’t see. 

This has opened a new avenue for those who want to be involved in competitive sports. Instead of excelling on the field, pitch, or court, the most valuable member of a team might be the person who excels in mathematics. 

The Executive Producer of Everything Everywhere Daily is Charles Daniel.

The associate producers are Thor Thomsen and Peter Bennett.

Today’s review comes from listener nhand1022, from Apple Podcasts in the United States. He writes, 

My favorite podcast, hands down

I started listening to this podcast during the pandemic, and it was a godsend. It engaged my addlepated brain at a time when the isolation of lockdown was really getting to me. For that, I will be forever grateful. I’ve found every episode to be interesting and engaging, even when I thought I wouldn’t care about the subject matter. My favorite episode has always been the Halifax explosion, but now it’s the domestication of cats. Learning more about how they came to be our companions was truly fascinating. It made me wonder, if human agriculture caused an explosion in the rodent population, is it possible that this therefore caused an explosion in the population of cats? So humans inadvertently created our own companions? Hmm. Also, I have a few furry little predators at home who appreciated the positive publicity! Thanks Gary! I wish you all the best.

Thanks, nhand! We have no data on the wild or domestic cat population from thousands of years ago, but it is very reasonable to assume that the rise of human agriculture resulted in an increase in the population of cats. More food, means more mice, means more cats. 

Where I live, it is easy to see how farms have increased the population of whitetail deer, which feed on corn. 

Remember, if you leave a review or send me a boostagram, you too can have it read on the show.