Italian translation at settesei.it
Last night, the heavily-favored Janko Tipsarevic won his first round match against Guillaume Rufin despite dropping the first two sets. Had Rufin taken the first two sets against Janko in Cincinnati, Monte Carlo, or just about anywhere else on the ATP tour, he would’ve scored his first top-ten scalp.
Other seeds have similar stories. Milos Raonic, Marin Cilic, Gilles Simon, and Alexandr Dolgopolov all would be headed home had their matches been judged on the first three sets. Only two seeds had the opposite experience: Juan Monaco and Tommy Haas were each up two sets to love before losing their next three.
Simply (if tongue-twistingly) put, the five-set format favors favorites.
In all grand slam first rounds since 1991, seeds have come back from 0-2 or 1-2 down against unseeded players 125 times, while seeds have squandered 2-0 or 2-1 advantages only 71 times. Just looking at those 32 matches per slam, that’s almost one upset averted per tournament. The US Open draw would look awfully different right now if Tipsarevic, Raonic, Cilic, Simon, and Dolgopolov were among the first-round losers, even if Haas and Monaco replaced them in the second round.
Set theory
These numbers shouldn’t surprise us, since longer formats should do a better job of revealing the better player. There are reasons why the baseball World Series is best-of-7 instead of a single game and the final sets of singles matches aren’t super-tiebreaks. The difference between best-of-3 and best-of-5 isn’t quite so simple–fitness and mental strength play a part–but from a purely mathematical perspective, there should be fewer upsets in best-of-5s than best-of-3s.
Take Raonic for example. My numbers (which don’t differentiate between 3-set and 5-set matches–shame on me) gave him approximately a 70% chance of beating Santiago Giraldo. If 70% is his probability of winning a three-set match and sets are independent (more on that in a minute), that number implies a 63.7% chance of winning any given set. A 63.7% chance of winning a set translates into a 74.4% shot at winning a best-of-five.
A four- or five-point increase doesn’t radically change the complexion of the tournament, but it does make a different. My original numbers suggested that we could expect 20 or 21 first-round upsets. If we adjust my odds in the manner I described for Raonic, the likely number of upsets falls to 18.
The most important implication here is the effect it has on the chances that top players reach the final rounds. Earlier this week a commenter took me to task for my unintuively low probabilities that Federer and Djokovic would reach the semifinals. Obviously, if you give an overwhelming favorite a boost in every round, as the five-set format does, the cumulative effect is substantial. For the top seeds, it can halve their probability of losing against a much lower-ranked opponent.
For Federer, adjusting the odds to reflect the theoretical advantage of the best-of-five format raises his chances of reaching the semis from 52.5% to over 65%. Djokovic’s numbers are almost identical.
Dependent outcomes
Everything I’ve said so far seems intuitively sound, with one caveat. Earlier I mentioned the assumptions that sets are independent. That is, a player has the same chance of winning a particular set no matter what the outcome of the previous sets–there is no “hangover effect” based on what has come before.
Tennis players, even professionals, aren’t robots, so the assumption probably isn’t completely valid. Sometimes frustration with one’s own performance, the environment, or line calls can carry over into the next set and give one’s opponent an advantage. Perhaps more importantly, the result of one set sometimes reveal that pre-match expectations were wrong in the first place. Had David Nalbandian played this week instead of withdrawn, no number of sets would reveal that he was a better player–his health would prevent him from playing at his usual level.
Another related caveat is that beyond a certain match length, the outcome is no longer dependent on the same skills. When Michael Russell played Yuichi Sugita in the Wimbledon qualifying round, the two men looked equal for four sets. In the fifth, Russell’s fitness gave him an advantage that didn’t exist in the first couple of hours. In this case, an estimate of Russell’s probability of winning a set against Sugita may be independent of previous outcomes, but it is not the same for every set.
These allowances aside, there is little doubt that favorites are more likely to win best-of-five matches than best-of-threes. Whether you want to watch the entire thing … that’s another story.
really nice analysis. May request a post on the related topic of winning the 5th (or 3rd) set in a match ? 1) What are good predictors of the last set, is it higher rank ? 2) any correlations with any of the first 4 sets ? (like Djoker lost only 1-2 finals after winning 1st set , not sure how many went to last set) (3) can we take fatigue into account by say, whether the player played a 4/5 setter in the previous match (we can ignore this for 1st match of the tournament). 4) any effect of the surface ? 5) big matches (like finals) vs seeded player having off-day against a rookie in the early rounds 6) Which players tend to deviate most from what is expected given the first serve win pcts ?
Id be happy to do some of these on my own if pointed to the right data.