10 November 2014

Yes, Of Course Home-Field Advantage Exists in the Postseason

I like baseball, and I like probability, and so I especially like thinking about probability in baseball. As you might imagine, this takes me to the mathier corners of the part of the Internet devoted to baseball, and the baseballier corners of the part of the Internet devoted to math, and the most unapologetically nerdy corners of the Internet in general.

One Internet place where I spend a lot of my time is The Hardball Times, which I would say is probably second only to Fangraphs for being good at answering the kind of baseball questions you might (if you are a baseball-inclined person) vaguely wonder about from time to time but tend to assume are intractable. The Hardball Times is very interesting and I enjoy it quite heartily.

Except last Friday. Last Friday was not a good day for The Hardball Times, because last Friday The Hardball Times published this article, which uses very bad math to draw a very wrong conclusion. (The article's author is named Jeffrey Gross, and, according to his blurb on the site, he is an attorney who enjoys both baseball and beer. As an attorney (for real now!) who also enjoys both baseball and beer--the former much more than the latter--I appreciate and applaud Mr. Gross's good taste and wish him no ill will. I mean only to correct his erroneous conclusion, without casting any aspersions on the man himself. You're a good egg, Jeff, or at least I assume so based on your profession and interests.)

In short, Mr. Gross concludes that, given the 2-3-2 structure of the League Championship Series and World Series--that is, if Team A is the team with home-field advantage, the first two games are played at Team A, the following three at Team A's opponent (whom we shall call, creatively, Team B), and the final two at Team A--home-field advantage does not exist in the postseason because the only possible length of series in which the team with home-field advantage has actually had more home games is the maximum, seven. "[I]f the series lasts five games," argues perfectly nice and smart guy (I'm guessing) Jeff Gross, "the value of home-field advantage inures to the team without 'home-field advantage' in the series" because that team has had three home games to the "home-field advantage" team's two. Modeling a best-of-seven series as a sequence of independent events between two teams of equal strength in which the home team has a 54% chance of winning each individual game, Jeffy G. (as I am sure he would allow me to call him if we were buds) presents a binomial distribution probability table (good! excellent! hooray!) identifying the probability of each team winning a series of four, five, six, or seven games.

The second sentence following the binomial distribution table is:

"Also, please keep in mind this distribution assumes the probability of any series length is equal."

Precisely 2.7 seconds after reading that sentence, I literally* had a massive aneurysm that put me in a deep coma from which I have not yet emerged--I am currently composing this post using cutting-edge brain wave translation technology.

See, you can't assume that. That assumption is mathematically incompatible, at the most basic level, with the use of a binomial distribution to model the outcome of an individual game. Here's why.

Suppose Mr. Gross's model is suitable--i.e. that a best-of-seven series in the Major League Baseball postseason can be thought of as a series of discrete, independent events between two evenly-matched teams in which the home team has a 54% chance of winning each game.** What is the likelihood of a series ending in a four-game sweep? Well, that's relatively easy to figure out: using our nomenclature from above, either Team A wins Games One, Two, Three, and Four, or Team B wins Games One, Two, Three, and Four--those are the only two possibilities for a sweep. The probability that Team A wins each game is 54% for Games One and Two (because Team A is the home team for those games) and 46% for Games Three and Four; these probabilities are reversed for Team B. So:

Probability that Team A wins in a sweep = 0.54 * 0.54 * 0.46 * 0.46 = 0.0617 = 6.17%
Probability that Team B wins in a sweep = 0.46 * 0.46 * 0.54 * 0.54 = 0.0617 = 6.17%

Each team has the same probability of winning if the series is a sweep, but that's not what's interesting. What's interesting is that we can tell from this extremely basic calculation that the overall probability of a sweep is 6.17% + 6.17% = 12.34%. If the probability of each series length were equal, we would expect the probability of a sweep to be 25%, because there are four possible series lengths (four, five, six, or seven games). Assuming that a sweep is just as likely as a five-, six-, or seven-game series means assuming that a sweep is more than twice as likely as it actually is.

So now what? We can extend this model (which is a frequently used one called a binomial probability model) to see the probability of each team winning a series of each possible length:***


4 games
5 games
6 games
7 games
TOTAL
Probability of Team A winning in:
6.17%
11.50%
16.64%
16.94%
51.25%
Probability of Team B winning in:
6.17%
13.50%
14.64%
14.43%
48.75%
TOTAL
12.34%
25.00%
31.29%
31.37%
100.00%


You'll notice that this is the exact same table that Jeff Gross put in his article, which means that he actually did all of the math correctly! The only change I made is the last, crucial, head-slappingly simple step of adding all of the probabilities up, which allows us to see clearly that a) each series length is not equally likely (six- and seven-game series are more likely than five-game series, which are way more likely than four-game series) and b) the likelihood of Team A (the team with home-field advantage, remember!) winning a series is 2.5 percentage points greater than the likelihood of Team B winning. Yes, Team B is more likely to win a five-game series--which is more than offset by the fact that Team A is more likely to win a six-game and a seven-game series, which are each more likely than a five-game series.

Jeff Gross's own math says home-field advantage in a best-of-seven series exists. Is it as big an advantage as the home-field advantage in a single game? No, but then why would we expect it to be? If we (hypothetically) pitted two teams of equal strength against each other in a best-of-seven series four hundred times (and are right that the home team wins a game about 54% of the time), we would expect the team with home-field advantage to win about 205 of those series. It's not huge, but it's something, which is more than nothing.

Sorry, Jeff. Better luck next time.

* By which I of course mean "figuratively."

** In reality, of course, this isn't true--the MLB postseason is constructed to pit stronger teams against weaker ones, and a huge number of factors, from bullpen rest to weather, can move the results one way or another and make each game at least weakly dependent on the others. However, the stronger team is almost always the one with home-field advantage (except in the World Series, in which, for reasons that pass understanding but are almost certainly asinine, home-field advantage is conferred upon the team from the league that won the All-Star Game), so our results will actually understate how frequently the home-field advantage team wins the series, and over a large enough sample size, the other factors tend to even out. This model will do as well as any other one of such (relative) simplicity.

*** How I did this is boring and tedious, but in short: there are 70 possible outcomes for a best-of-seven series (specifically, each team has one way to win in a sweep, four ways to win in five games, ten ways to win in six games, and twenty ways to win in seven games). I calculated the probability of each outcome individually and then simply added them up.

No comments:

Post a Comment

Comment Policy:

Excessively Logical places no restriction on the language that can be used in comments, but appropriate spelling, grammar, punctuation, and lack of "text message-speak" are greatly appreciated. All points of view are welcome here, but abusive comments (i.e. comments that directly attack Tyler or another commenter) are not tolerated and will be swiftly deleted; those who leave abusive comments will be warned and, if the problem continues, banned.