Here are my pre-tournament odds for the 2012 US Open. For some background reading, follow the links for more on my player rating system, current rankings, and more on how I simulate tournaments.
I’ve made one tweak to the algorithm (for men only) since last posting odds. As many of you have noticed, I seem to underestimate the chances that the very best players will progress through the draw. Some analysis of past results showed that this is correct, so for now, there’s a bit of a band-aid in the system, boosting the odds of the current top ten in a way that reflects how they’ve outperformed my projections in the past.
Still, Federer and Djokovic both have well under 30% chances of winning the Open, and fall just short of 50% between them. My rankings give Djokovic a very slight edge despite Federer’s big season, and the tournament draw, which places Murray in Federer’s half, firmly tilts the scales in the Serb’s favor.
Player R64 R32 R16 W 1 Roger Federer 90.6% 84.0% 74.0% 23.2% Donald Young 9.4% 5.4% 2.5% 0.0% Maxime Authom 32.9% 2.3% 0.7% 0.0% Bjorn Phau 67.1% 8.3% 3.7% 0.0% Albert Ramos 50.1% 15.1% 1.7% 0.0% Robby Ginepri 49.9% 14.8% 1.7% 0.0% Rui Machado 15.1% 5.5% 0.4% 0.0% 25 Fernando Verdasco 84.9% 64.6% 15.4% 0.3% Player R64 R32 R16 W 23 Mardy Fish 77.1% 50.6% 33.9% 1.3% Go Soeda 22.9% 8.8% 3.3% 0.0% Nikolay Davydenko 88.6% 39.4% 21.4% 0.2% Guido Pella 11.4% 1.2% 0.1% 0.0% Ivo Karlovic 67.5% 34.2% 14.7% 0.1% Jimmy Wang 32.5% 10.9% 3.0% 0.0% Michael Russell 35.7% 16.2% 5.4% 0.0% 16 Gilles Simon 64.3% 38.6% 18.1% 0.3% Player R64 R32 R16 W 11 Nicolas Almagro 52.9% 33.6% 20.2% 0.3% Radek Stepanek 47.1% 28.5% 16.5% 0.2% Nicolas Mahut 48.7% 18.2% 8.6% 0.0% Philipp Petzschner 51.3% 19.6% 9.5% 0.0% Blaz Kavcic 45.9% 15.3% 4.8% 0.0% Flavio Cipolla 54.1% 19.8% 6.9% 0.0% Jack Sock 19.8% 7.7% 1.9% 0.0% 22 Florian Mayer 80.2% 57.2% 31.6% 0.5% Player R64 R32 R16 W 27 Sam Querrey 64.9% 51.7% 27.6% 0.7% Yen-Hsun Lu 35.1% 23.9% 9.3% 0.1% Ruben Ramirez Hidalgo 31.4% 4.8% 0.8% 0.0% Somdev Devvarman 68.6% 19.6% 5.5% 0.0% Denis Istomin 62.4% 23.8% 11.8% 0.1% Jurgen Zopp 37.6% 10.2% 3.8% 0.0% David Goffin 28.7% 14.8% 6.9% 0.0% 6 Tomas Berdych 71.3% 51.3% 34.3% 1.7% Player R64 R32 R16 W 3 Andy Murray 87.6% 76.3% 63.9% 13.7% Alex Bogomolov Jr. 12.4% 6.3% 2.7% 0.0% Hiroki Moriya 22.9% 1.8% 0.4% 0.0% Ivan Dodig 77.1% 15.7% 7.8% 0.1% Thomaz Bellucci 65.9% 29.0% 6.6% 0.1% Pablo Andujar 34.1% 9.9% 1.4% 0.0% Robin Haase 31.9% 15.6% 3.0% 0.0% 30 Feliciano Lopez 68.1% 45.5% 14.1% 0.3% Player R64 R32 R16 W 24 Marcel Granollers 63.8% 37.7% 19.2% 0.2% Denis Kudla 36.2% 16.4% 6.3% 0.0% Lukas Lacko 46.7% 20.6% 8.4% 0.0% James Blake 53.3% 25.2% 10.8% 0.1% Paul-Henri Mathieu 45.6% 14.3% 5.9% 0.0% Igor Andreev 54.4% 19.2% 8.7% 0.0% Santiago Giraldo 30.9% 16.5% 7.7% 0.0% 15 Milos Raonic 69.1% 50.0% 33.0% 1.0% Player R64 R32 R16 W 12 Marin Cilic 70.6% 56.4% 31.1% 0.9% Marinko Matosevic 29.4% 18.6% 6.5% 0.0% Daniel Brands 70.6% 20.5% 6.0% 0.0% Adrian Ungur 29.4% 4.5% 0.7% 0.0% Tim Smyczek 53.1% 15.1% 5.8% 0.0% Bobby Reynolds 46.9% 12.1% 4.3% 0.0% Guido Andreozzi 5.7% 0.9% 0.1% 0.0% 17 Kei Nishikori 94.3% 71.9% 45.6% 1.7% Player R64 R32 R16 W 32 Jeremy Chardy 84.1% 55.5% 23.6% 0.3% Filippo Volandri 15.9% 4.3% 0.7% 0.0% Tatsuma Ito 44.6% 16.6% 4.5% 0.0% Matthew Ebden 55.4% 23.6% 7.3% 0.0% Martin Klizan 42.3% 8.7% 3.2% 0.0% Alejandro Falla 57.7% 14.7% 6.4% 0.0% Karol Beck 16.7% 8.2% 3.2% 0.0% 5 Jo-Wilfried Tsonga 83.3% 68.5% 51.2% 3.9% Player R64 R32 R16 W 8 Janko Tipsarevic 81.6% 69.4% 49.7% 1.9% Guillaume Rufin 18.4% 10.4% 3.8% 0.0% Brian Baker 40.9% 7.1% 1.8% 0.0% Jan Hajek 59.1% 13.1% 4.5% 0.0% Grega Zemlja 55.9% 22.5% 8.1% 0.0% Ricardo Mello 44.1% 15.5% 4.7% 0.0% Cedrik-Marcel Stebe 39.2% 21.6% 8.2% 0.0% 29 Viktor Troicki 60.8% 40.4% 19.2% 0.2% Player R64 R32 R16 W 19 Philipp Kohlschreiber 54.1% 32.9% 16.2% 0.3% Michael Llodra 45.9% 26.1% 11.9% 0.2% Grigor Dimitrov 54.9% 23.7% 9.8% 0.1% Benoit Paire 45.1% 17.4% 6.4% 0.0% Mikhail Kukushkin 46.2% 14.5% 6.0% 0.0% Jarkko Nieminen 53.8% 18.3% 8.2% 0.1% Xavier Malisse 33.7% 19.2% 9.6% 0.1% 9 John Isner 66.3% 48.0% 31.9% 1.6% Player R64 R32 R16 W 13 Richard Gasquet 82.1% 51.9% 27.6% 0.9% Albert Montanes 17.9% 5.3% 1.3% 0.0% Jurgen Melzer 82.7% 39.6% 18.1% 0.3% Bradley Klahn 17.3% 3.1% 0.5% 0.0% Steve Johnson 35.5% 5.3% 1.1% 0.0% Rajeev Ram 64.5% 15.4% 4.7% 0.0% Ernests Gulbis 27.6% 18.4% 7.6% 0.0% 21 Tommy Haas 72.4% 60.9% 39.1% 2.5% Player R64 R32 R16 W 28 Mikhail Youzhny 68.2% 49.4% 22.9% 0.6% Gilles Muller 31.8% 17.4% 5.2% 0.0% Tobias Kamke 48.9% 15.9% 4.2% 0.0% Lleyton Hewitt 51.1% 17.2% 4.6% 0.0% Igor Sijsling 69.4% 17.1% 7.3% 0.0% Daniel Gimeno-Traver 30.6% 4.0% 1.0% 0.0% Kevin Anderson 27.6% 18.3% 9.8% 0.1% 4 David Ferrer 72.4% 60.6% 44.9% 3.9% Player R64 R32 R16 W 7 Juan Martin Del Potro 70.1% 55.3% 45.2% 4.6% David Nalbandian 29.9% 18.4% 12.2% 0.3% Benjamin Becker 48.9% 12.7% 7.0% 0.0% Ryan Harrison 51.1% 13.6% 7.7% 0.1% Lukasz Kubot 71.1% 38.8% 11.8% 0.1% Leonardo Mayer 28.9% 10.0% 1.5% 0.0% Tommy Robredo 31.0% 11.8% 2.1% 0.0% 26 Andreas Seppi 69.0% 39.5% 12.5% 0.1% Player R64 R32 R16 W 20 Andy Roddick 89.4% 57.3% 36.9% 1.1% Rhyne Williams 10.6% 2.0% 0.4% 0.0% Carlos Berlocq 23.0% 5.2% 1.5% 0.0% Bernard Tomic 77.0% 35.5% 19.7% 0.3% Edouard Roger-Vasselin 44.4% 14.4% 4.3% 0.0% Fabio Fognini 55.6% 21.1% 7.3% 0.0% Guillermo Garcia-Lopez 38.8% 22.5% 8.9% 0.0% 10 Juan Monaco 61.2% 41.9% 21.0% 0.4% Player R64 R32 R16 W 14 Alexandr Dolgopolov 61.8% 36.8% 19.6% 0.3% Jesse Levine 38.2% 18.1% 7.7% 0.0% Marcos Baghdatis 67.8% 34.5% 17.2% 0.2% Matthias Bachinger 32.2% 10.6% 3.5% 0.0% Steve Darcis 59.5% 23.6% 10.8% 0.1% Malek Jaziri 40.5% 12.6% 4.6% 0.0% Sergiy Stakhovsky 28.8% 14.1% 5.8% 0.0% 18 Stanislas Wawrinka 71.2% 49.8% 30.9% 0.8% Player R64 R32 R16 W 31 Julien Benneteau 64.7% 43.7% 9.6% 0.3% Olivier Rochus 35.3% 18.7% 2.8% 0.0% Dennis Novikov 34.1% 9.6% 1.0% 0.0% Jerzy Janowicz 65.9% 28.1% 4.4% 0.0% Rogerio Dutra Silva 39.5% 2.5% 0.6% 0.0% Teymuraz Gabashvili 60.5% 5.4% 1.9% 0.0% Paolo Lorenzi 6.4% 3.6% 1.2% 0.0% 2 Novak Djokovic 93.6% 88.6% 78.5% 26.5%
Hi Jeff,
I enjoy seeing these simulations, but I still think the Big 4 (or Big 3, in the case of this year’s USO) odds are understated.
For example, although the SF odds aren’t given, I think it’s reasonable to infer from the table that both Federer and Djokovic, using your system, have about a 50% chance of making the SF stage (the actual value isn’t important for what follows).
This means that (assuming probabilities are independent, which is a reasonable first assumption) there’s about a 1/4 chance that both will make the SF, and a 1/4 chance that neither will. But, as we know, both players have made the SFs in New York for the last 5 years: furthermore, going back over the last 20 GSs (ie to USO 2007), Federer has made the SF stage 17 times of 20 (RG 2010, Wimbledon 2010/2011 excepted), and Djokovic has made it to the SFs and beyond 14 times of 20, including each of the last 10 GS tournaments.
I would also be able to make a lot of money in the long run, I think, taking the opposite side on a “Donald Young beats Roger Federer 1 time in 11 over 5 sets at USO 2012.” bet.
The dominance of the Big 4 leads to some squirrelly inputs to any kind of simulation model (I’ve had a hack at this myself in the past). Still, even with the band aid, a 50/50 chance for Federer and Djokovic is still too low, I reckon.
If you look at the odds of these specific players, yes, they would appear to be understated. But as they say, past performance doesn’t guarantee future results. If you take previous top fours, or previous players who have exhibited similar levels of dominance, this is what the numbers spit out. Maybe Federer and/or Djokovic have some magic semi-reaching talents that go beyond their generally exhibited skill, but that isn’t something I would bet on.
I was surprised by the odds for Fed/Young, too. But Donald has some quality results within the time frame my system is looking at, even if common sense suggests he isn’t about to post another one.
I agree past performance doesn’t guarantee future results. But possibly it should be used to calibrate forecasts?
For example, this year your model gave Rafael Nadal about a 30% chance of winning Roland Garros (30.4%). As we know, Nadal had won 6 of 7 times he’d entered the tournament before 2012. For argument’s sake, let’s assume that this was, in fact, an unlikely outcome. Let’s find the probability of winning RG if Nadal is likely to win RG 6 or more times on 5% of the occasions he enters 7 straight RGs (for simplicity, again assume probabilities don’t change year on year, and each tournament is an independent outcome) using a basic binomial model.
It turns out that 6 or more wins at RG only happens on 0.4% of occasions with a single tournament win probability of 30.4% – ie, pretty unlikely. We’d expect 4 wins or fewer about 97% of the time. So this isn’t a compelling argument for saying Nadal’s probability of winning is about 30%.
Suppose we crank the probability up to 50%: now Nadal wins 6 tournaments of 7 5.5% of the time, and 7 of 7 0.8% of the time. Not very likely, but above our threshhold.
What probability of winning the tournament does Nadal need to have to make at least 6 of 7 wins a 50/50 proposition? About 77%.
There are lots of results over the last few years that seem to defy ordinary modeling logic – for example, Federer’s QF streak, and the 29/30 Ws for the Big 3 since RG 2005. Adjusting models like this is part art, part science. Still, I submit that models need to be calibrated so our recent experience doesn’t come across as an extreme outlier.
But what if our recent experience *is* an extreme outlier?
The question isn’t whether past performance should be used to calibrate forecasts, it’s whether it should be used on a player-specific basis to calibrate forecasts.
At what point do you start recognizing one of these outliers and building that into the model for that player or small group of players? Two french wins for Nadal? Four? 10 QFs for Fed? 20? At what point do you stop? When Fed is 30? 33? When Nadal has missed x recent tournaments because of injury?
You’re 100% correct that the current big 3 have broken the model, and if anyone was betting against them based on my model, they would’ve lost a lot of money. If I were personally putting money on the line this week, I’d put it against my odds for Fed and Djok to reach the semis. But I’m more interested in broader patterns, and I’ll take long-term trends over short-term ones every time.
Before we go deeper into this question, here’s a hack I had at some related questions five years or so ago:
http://tennisworld.typepad.com/tennisworld/2008/01/the-years-that.html
http://blogs.tennis.com/tennisworld/2008/01/the-years-tha-1.html
I hope you think it’s interesting.
Back then, I had Nadal as 65% vs Fed in a clay match. By 2009, I’d dialled that up to about 80%.
Basically, I believe that models are useful if they provide insights. There’s a balance between the general and specific. Unusual outcomes do happen (eg Rosol d Nadal), but you have to be prepared to adjust a model if it is repeatedly predicting things that don’t calibrate with experience, unless you can demonstrate that there are overwhelming reasons for believing experience is the outlier. You also (black swan alert!) have to be careful about models that just seem to reinforce conventional wisdom. I come from England, where we’d had falling national house prices in the 1990s. When I learned, in 2007, that rating agencies’ models didn’t allow US national house prices to fall, I went “ruh roh.”
What’s the saying? “Once is happenstance, twice is coincidence, the third time is enemy action.” If Federer AND Nadal AND Djokovic are consistently achieving results that a model fails to forecast, time to adjust the model.
What did you tweak to ‘adjust’ the odds?
I slightly increased the player rating of the top 10.