This week, the semifinals are swapped: If Federer is to continue his streak, he’ll need to go through Djokovic just to reach the final. The most dangerous lower-ranked players seem to be well distributed throughout the draw: Del Potro would face Djokovic in the quarters; Roddick could hit Federer in the third; Raonic could face Murray in the third, and Isner is lurking in Nadal’s quarter.
If Fernando Gonzalez wants to make a run in his final tournament, he’ll have to go through Tomas Berdych in the second round–not exactly the easiest ask, even at the South American slam.
Here are the complete projections for the draw as it stands now, without qualifiers placed:
Player R64 R32 R16 W (1)Novak Djokovic 100.0% 82.4% 71.6% 22.5% Leonardo Mayer 17.5% 0.9% 0.2% 0.0% Marcos Baghdatis 82.5% 16.8% 10.6% 0.4% Qualifier1 49.9% 16.7% 2.0% 0.0% Qualifier2 50.1% 16.5% 1.9% 0.0% (27)Viktor Troicki 100.0% 66.8% 13.7% 0.3% Player R64 R32 R16 W (17)Richard Gasquet 100.0% 67.7% 42.0% 0.9% Cedrik-Marcel Stebe 55.5% 19.2% 9.0% 0.1% Flavio Cipolla 44.5% 13.1% 5.5% 0.0% Albert Ramos-Vinolas 42.0% 10.7% 2.7% 0.0% Qualifier3 58.0% 18.6% 5.9% 0.0% (15)Feliciano Lopez 100.0% 70.6% 34.9% 0.4% Player R64 R32 R16 W (11)Juan Martin Del Potro 100.0% 79.0% 57.1% 6.0% Lukasz Kubot 45.8% 9.0% 3.7% 0.0% Ivo Karlovic 54.2% 12.0% 5.2% 0.0% Igor Kunitsyn 42.4% 9.1% 1.6% 0.0% (WC)Jesse Levine 57.6% 16.0% 3.3% 0.0% (23)Marin Cilic 100.0% 74.9% 29.1% 0.7% Player R64 R32 R16 W (30)Julien Benneteau 100.0% 61.7% 24.6% 0.2% Benjamin Becker 47.4% 17.5% 5.1% 0.0% Olivier Rochus 52.6% 20.8% 6.1% 0.0% Sergiy Stakhovsky 34.2% 10.4% 4.9% 0.0% Bernard Tomic 65.8% 30.0% 19.0% 0.5% (5)David Ferrer 100.0% 59.6% 40.4% 1.4% Player R64 R32 R16 W (3)Roger Federer 100.0% 84.8% 66.5% 12.4% (WC)Ryan Harrison 77.5% 13.9% 6.5% 0.1% Potito Starace 22.5% 1.4% 0.3% 0.0% Alex Bogomolov 54.9% 18.9% 3.7% 0.0% Gilles Muller 45.1% 13.6% 2.3% 0.0% (31)Andy Roddick 100.0% 67.5% 20.6% 0.6% Player R64 R32 R16 W (21)Juan Monaco 100.0% 61.3% 23.9% 0.3% Yen-Hsun Lu 43.1% 15.1% 3.9% 0.0% Jarkko Nieminen 56.9% 23.6% 7.5% 0.0% Qualifier4 31.8% 6.2% 2.3% 0.0% Ernests Gulbis 68.2% 23.7% 13.1% 0.2% (14)Gael Monfils 100.0% 70.1% 49.3% 2.6% Player R64 R32 R16 W (12)Nicolas Almagro 100.0% 64.3% 36.5% 0.7% Qualifier5 38.3% 11.1% 4.1% 0.0% Donald Young 61.7% 24.6% 11.6% 0.1% Qualifier6 66.9% 20.1% 6.7% 0.0% Carlos Berlocq 33.1% 5.8% 1.2% 0.0% (20)Fernando Verdasco 100.0% 74.1% 40.0% 0.7% Player R64 R32 R16 W (28)Kevin Anderson 100.0% 54.9% 24.6% 0.4% Sam Querrey 59.2% 28.9% 12.5% 0.2% Matthew Ebden 40.8% 16.3% 5.8% 0.0% Qualifier7 37.2% 8.2% 2.7% 0.0% Jeremy Chardy 62.8% 20.6% 9.3% 0.1% (8)Mardy Fish 100.0% 71.2% 45.1% 2.0% Player R64 R32 R16 W (7)Tomas Berdych 100.0% 83.6% 65.0% 5.0% Nicolas Mahut 85.8% 15.9% 7.3% 0.0% (WC)Fernando Gonzalez 14.2% 0.5% 0.1% 0.0% Grigor Dimitrov 47.7% 35.6% 11.0% 0.1% Mikhail Kukushkin 52.3% 39.8% 13.2% 0.1% (29)Juan Ignacio Chela 100.0% 24.5% 3.4% 0.0% Player R64 R32 R16 W (18)Alexandr Dolgopolov 100.0% 69.7% 35.0% 0.8% Qualifier8 43.4% 11.8% 3.3% 0.0% (WC)Denis Kudla 56.6% 18.5% 6.2% 0.0% David Nalbandian 66.0% 34.0% 19.7% 0.6% Steve Darcis 34.0% 11.9% 4.9% 0.0% (9)Janko Tipsarevic 100.0% 54.1% 30.8% 0.8% Player R64 R32 R16 W (13)Gilles Simon 100.0% 66.6% 41.5% 0.9% Qualifier9 36.5% 9.4% 3.5% 0.0% Andreas Seppi 63.5% 24.0% 12.2% 0.1% Robin Haase 58.0% 24.0% 9.3% 0.0% (WC)Marinko Matosevic 42.0% 14.0% 4.4% 0.0% (22)Jurgen Melzer 100.0% 62.0% 29.1% 0.3% Player R64 R32 R16 W (26)Milos Raonic 100.0% 69.3% 23.2% 0.8% Dudi Sela 59.0% 20.0% 4.3% 0.0% Qualifier10 41.0% 10.7% 1.7% 0.0% Alejandro Falla 42.5% 6.0% 2.2% 0.0% Denis Istomin 57.5% 10.7% 4.6% 0.1% (4)Andy Murray 100.0% 83.3% 64.0% 12.0% Player R64 R32 R16 W (6)Jo-Wilfried Tsonga 100.0% 80.8% 57.6% 4.5% Qualifier11 43.8% 7.4% 2.5% 0.0% Xavier Malisse 56.2% 11.8% 4.8% 0.0% Thomaz Bellucci 70.8% 28.9% 9.0% 0.1% Frederico Gil 29.2% 6.5% 1.1% 0.0% (32)Philipp Kohlschreiber 100.0% 64.5% 24.9% 0.4% Player R64 R32 R16 W (19)Florian Mayer 100.0% 60.2% 27.7% 0.6% Philipp Petzschner 45.4% 17.0% 5.9% 0.0% Ivan Dodig 54.6% 22.7% 8.7% 0.1% Nikolay Davydenko 59.5% 18.7% 8.4% 0.1% James Blake 40.5% 9.7% 3.5% 0.0% (10)John Isner 100.0% 71.6% 45.8% 2.2% Player R64 R32 R16 W (16)Kei Nishikori 100.0% 70.2% 44.6% 1.7% Ryan Sweeting 42.7% 11.3% 4.4% 0.0% Lukas Lacko 57.3% 18.5% 8.4% 0.1% Michael Llodra 63.8% 27.7% 11.2% 0.1% Lukas Rosol 36.2% 11.0% 3.1% 0.0% (24)Marcel Granollers 100.0% 61.3% 28.3% 0.4% Player R64 R32 R16 W (25)Radek Stepanek 100.0% 64.8% 13.3% 0.1% Qualifier12 75.1% 30.6% 4.8% 0.0% Tommy Haas 24.9% 4.6% 0.3% 0.0% Pablo Andujar 28.6% 2.3% 0.9% 0.0% Santiago Giraldo 71.4% 12.7% 7.4% 0.1% (2)Rafael Nadal 100.0% 84.9% 73.3% 14.0%
Do you also take into account the player’s current form and the Head to head of players?
More recent matches are weighted more heavily, yes. There’s a small H2H component, but my research has indicated that H2H doesn’t add very much accuracy to the projections.
Looking at your predictions, I’d like to ask you something. Your algorithm takes into account the previous results of many matches, including the recent ones as well as the older. So my question is, Don’t you think that adding a bigger weight to the recent matches could lead to better predictions?
I mean, when you look at the way some top players are playing right now, for example, Mardy Fish, I really don’t think the probability he has of winning the tournament is that big (~2%) …
I think Nabaldian would have more changes of qualifying to the R32 than Tipsarevic, however your predictions predicts otherwise (Tipsarevic has 54.2% of probability of qualifying to R32 against 34% of Nalbandian)…
Maybe I’m wrong, but don’t you think that this could be like that because your algorithm doesn’t give a bigger weight to the latest result ??
Sorry, when my message was posted the other wasn’t there…
My algorithm does weight more recent matches more heavily. I tested a variety of weights to come up with what most accurate predicted outcomes in the past.
However, humans tend to weight more recent events way TOO heavily. We see one bad match from Fish (or Murray, or whomever) and think that he must be out of form … my system recognizes that players have blips like that, and while they tell us something, they don’t tell us that much, compared to their record over a span of a year or two.
OK, I’ll put up a completely ignorant question – does this mean you are in effect calculating variance for each player? And if so is it in effect a “moving variance”? E.g. if I were a poker player trying to calculate my expectation for a given game (let’s say $5/$10 limit hold’em), typically I would need to decide what period I chose. With poker, variance is extremely high, so the longer the period the better, thus I might include literally all my results going X many years back, even if I think I’ve become a better player in the last couple of years. With tennis players, what seems to work best?
Probably though I’ve got this all wrong . . .
I think I got your point. However I think that the recent events should be heavier when trying to predict the result of a match between players nearly ranked.
I mean, looking at the way Federer and Nadal are playing, I wouldn’t be surprised to see Federer defeat Nadal again, if they both get to the final…
“I mean, looking at the way Federer and Nadal are playing, I wouldn’t be surprised to see Federer defeat Nadal again, if they both get to the final…”
The problem with this is it’s weighting a single result as if the outcome (Federer winning at Indian Wells) were expressive of a trend. A single result is not a trend, it’s a single result, even if it’s recent. Subjectively we may have thoughts such as “Ah, Fed beat Nadal, he’s hot and Nadal is not.” But that doesn’t translate into any sort of algorithm that could ever hold up.
Along the same lines, I was dipping recently into a book for laypeople by the cognitive psychologist and reseacher Daniel Kahneman, and he mentioned that experts often go wrong in decision making by under-weighting statistical evidence and over-weighting their own ability to subjectively interpret individual events as having been caused by complex factors they feel they have insight into. Another way of saying this is, complex problems are often not best solved by complex methods – they can be better solved (in the long run especially) by simple methods.
If we actually try to predict a tennis match by drawing out the many complex factors that go into it, we may not actually make the problem easier to solve – we may instead make it harder. One reason is, we have no method for systematically weighting the individual factors. Take the Indian Wells final: Nadal specifically said in his presser afterwards that the unexpected cold temperatures had caused the ball to bounce lower, favoring the Federer backhand over Nadal’s topspin strokes. What weight can we possibly assign this factor? We don’t really know. Common sense says Miami probably won’t experience chilling weather the way Indio, California did – but then, how much of Federer’s win at Indian Wells do we assign to the cold-weather factor, versus possibly something else at work? And then how could you ever duplicate this level of analysis in an algorithm that is intended to handle all players?
@wholesight — I’m not calculating variance, though some of what you say applies. I’m calculating the best approximation of the true current talent level of the player, factoring in everything we can about the environment.
AFAICT, the last two years are most relevant, and yesterday’s matches are roughly twice as relevant as matches 365 days ago, which are roughly twice as relevant 365 days before that.
@Daniel: As I said, I *do* weight more recently matches more heavily. And what you “think” should be done (or what I think should be done, or what anyone thinks) isn’t relevant here — the weighting formula I use is the best one I’ve found at predicting future results, between closely matched players or not-so-closely matched players.
@wholesight (2): those are good points about having too much information. From a practical perspective, I’m not going to bother including temperature in my database. I don’t have the data to know who had to play back-to-back days, or who played on center court, or who was battling the flu at Indian Wells, and so on. Maybe a skilled and experienced bettor (or bookie) could shade lines appropriately in some of those cases (I would certainly give Gonzo better odds this week than my algorithm does), but in general, much more info is too much info.
I have the same question that still has not been answered. (see your recent hardcourt rankings article).
Do you plan on regularly publishing a rankings list (eg every month or so) or are the rankings lists you have published previously centred around the slams? A list on a regular basis would be a wonderful, useful tool for people who like a little bet 🙂
I publish them when I have a chance and/or when they have something interesting to mention.