It’s one thing to predict a winner–it’s another thing to quantify how likely a player is to become that winner.
In most tennis tournaments, it’s not hard to pick a favorite. For most of the last year, it was Novak Djokovic, no matter the surface or who he might face. Before that, it was Federer on hard courts, Nadal on clay courts. While every one likes to identify a dark horse, there’s rarely much debate at the top.
Given that agreement, though, what odds would you have placed on Novak Djokovic winning Wimbledon? Or the French? Or an in-form Federer winning the tour finals over an injured Djokovic and a tired Nadal? Usually, my numbers spit out something between 20 and 30 percent–in theory, even the best player in the tournament has a better than two-thirds chance of going home a loser.
Intuitively, this is difficult to believe. Djokovic seemed so dominant for much of the year that his slam victories felt like foregone conclusions. Anyone who watched Novak on a good day found it impossible to imagine anyone outplaying him. When Carl Bialik wrote a column asking whether Djokovic could keep up his dominance for the entire season, most responses were some variation of “What are you, stupid? Numbers are irrelevant when someone is so good.”
But, all good things must come to end, and a combination of injuries and good opponents proved that even Djokovic is human.
That said, Djokovic’s dominance–and Nadal’s before him, and Federer’s before him–raises questions about forecasting tennis matches. The questions are complicated, but rest easy: today’s attempt at an answer will be simple.
Do the rules apply to the very best?
My ranking and forecasting system starts by assigning a number to every player, not unlike ATP ranking points. To keep things simple, let’s use ranking points. If we want to predict the outcome of, say, Mardy Fish against Feliciano Lopez, we take their point totals (2965 and 1755) and divide one by the sum of the others: 2965/(2965+1755) = 62.8%. (It’s a little more complicated than that, but not much.) Setting aside concerns like home court advantage and surface, that sounds about right to me.
Do the same with Djokovic and Lopez, and you get 88.6%. Work the numbers with Djokovic and world #100 Michael Berrer, and you get 96.0%. That’s pretty dominant, suggesting that Berrer would win only 1 in 25 matchups, but wait a minute–we’re saying Berrer’s going to beat Djokovic, ever?
And therein lies the problem. The formulas I use to generate points and generate predictions are reasonably accurate, tested against years of ATP results. And in the aggregate, individual match percentages pass the smell test. But at the extremes, the numbers seem questionable.
And it is at the extremes where the exact percentages matter the most. Consider my pre-tournament predictions for Wimbledon this year. While Nadal was the top seed, I picked Djokovic as the favorite, giving him a 21.6% chance of winning. But look at those first few rounds: I gave him only an 87% chance of getting past Jeremy Chardy (Jeremy Chardy!) in the first round, then only an 88% chance of beating Kevin Anderson or Ilya Marchenko, then only an 85% chance of winning against (probably) Marcos Baghdatis.
Only the last of those three numbers is plausible. And when combined, they meant that I gave Djokovic less than a 65% chance of reaching the round of 16. With all due respect to myself, that was almost as ridiculous then as it it sounds now.
It’s those early-round numbers that result in such minute chances that the favorite will win the tournament. Even if we give a player a 90% chance of winning all his matches, he’ll still only win the seven consecutive matches required for a grand slam 48% of the time. Lower it to 80%, and we’re down to 21% for the tournament. Since the odds of winning a semifinal match against the likes of Murray, Federer, or Nadal is probably much lower, it seems that early round odds should be much more favorable.
To summarize, one of two things is going on here. Either (1) my numbers underestimate the likelihood that the pre-tournament favorite wins a grand slam; or (2) our intuition overestimates the likelihood that the favorite takes home the trophy.
Forecasting for dummies
One way to pick between the two is to look at the recent past. Are pre-tournament favorites winning more or less than expected?
For now, let’s set aside the question of the likelihood that Djokovic beats Chardy or Marchenko, and look only at winning the tournament. We’re going to make two major assumptions here: (1) it’s possible to identify the pre-tournament favorite years later, and (2) favorites are generally created equal–Djokovic towers over his competitors to the same degree that Courier, or Lendl, or Sampras, or Federer towered over his. As usual, both of these assumptions probably aren’t true, but they aren’t so hideously wrong that they’ll stop us from reaching some worthwhile conclusions.
There are three easy ways of picking the pre-tournament favorite for a grand slam: using (a) the winner of the last slam; (b) the defending champion, and (c) the top seed–almost always the world #1. The top seed is probably best, while the defending champion might identify a player who is particularly good on the surface, and the winner of the last slam might pick out someone who is riding a hot streak.
The last 21 years (back to 1991, inclusive), give us 84 slams to work with. Our sample is a bit smaller than that, because occasionally the winner of the last slam or the defending champion did not play, and on three occasions, the top seed pulled out before the tournament began. Here is how the favorites did:
- Of the 75 players who had won the previous slam, 18 (24%) won the tournament.
- Of the 76 defending champions, 26 (34%) won the tournament.
- Of the 81 top seeds, 29 (36%) won the tournament. If we exclude the French (where the top seed is often #1 on the basis of hard court performance), we get a more dramatic result here–26 of 60 (43.3%) won the tournament.
All of these measures are much higher than the 21.6% shot I gave Djokovic at Wimbledon. And most are higher than the 27-28% chances I gave him at the French and US Open. The 43.3% likelihood that the top seed wins a hard-court slam (thank you, Pete and Roger!) suggests that a more sophisticated measure of identifying the favorite might allow us to predict slam champions with, say, 40% accuracy.
40% is considerably higher than my models are spitting out right now, but I suspect it is much lower than many fans imagine for their favorite. It suggests that, at the extremes, my predictions aren’t quite one-sided enough. It might take Michael Berrer more than 25 chances before he finally catches Djokovic on a bad day.