Slow Conditions Might Just Flip the Outcome of Federer-Nadal XL

Italian translation at settesei.it

Roger Federer likes his courts fast. Rafael Nadal likes them slow. With eight Wimbledon titles to his name, Federer is the superior grass court player, but the conditions at the All England Club have been unusually slow this year, closer to those of a medium-speed hard court.

On Friday, Federer and Nadal will face off for the 40th time, their first encounter at Wimbledon since the Spaniard triumped in their historical 2008 title-match battle. Rafa leads the head-to-head 24-15, including a straight-set victory at his favorite slam, Roland Garros, several weeks ago. But before that, Roger had won five in a row–all on hard courts–the last three without dropping a set.

Because of the contrast in styles and surface preferences, the speed of the conditions–a catch-all term for surface, balls, weather, and so on–is particularly important. Nadal is 14-2 against his rival on clay, with Federer holding a 13-10 edge on hard and grass. Another way of splitting up the results is by my surface speed metric, Simple Speed Rating (SSR). 22 of the matches have been been on a court that is slower than tour average, with the other 17 at or above tour average speed:

Matches     Avg SSR  RN - RF  Unret%  <= 3 shots  Avg Rally  
SSR < 0.92     0.74     17-5   21.2%       49.5%        4.7  
SSR >= 1.0     1.14     7-10   27.0%       56.9%        4.3

At faster events–all of which are on hard or grass–fewer serves come back, more points end by the third shot, and the overall rally length is shorter. Fed has the edge, with 10 wins in 17 tries, while on slower surfaces–all of the clay matches, plus a handful of more stately hard courts–Rafa cleans up.

Rafa broke Elo

According to my surface-weighted Elo ratings, Federer is the big semi-final favorite. He leads Nadal by 300 points in the grass-only Elo ratings, which gives him a 75% chance of advancing to the final. The betting market strongly disagrees, believing that Rafa is the favorite, with a 57% chance of winning.

The collective wisdom of the punters is onto something. Elo has systematically underwhelmed when it comes to forecasting the 39 previous Fedal matches. Federer has more often been the higher-rated player, and if Roger and Rafa behaved like the algorithm expected them to, the Swiss would be narrowly leading the head-to-head, 21-18. We might reasonably conclude that, going into Friday’s semi-final, Elo is once again underestimating the King of Clay.

How big of Fedal-specific adjustment is necessary? I fit a logit model to the previous 39 matches, using only the surface-weighted Elo forecast. The model makes a rough adjustment to account for Elo’s limitations, and reduces Roger’s chances of winning the semi-final from 74.8% all the way down to 48.5%.

Now, about those conditions

The updated 48.5% forecast takes the surface into account–that’s part of my Elo algorithm. But it doesn’t distinguish between slow grass and fast grass.

To fix that, I added SSR, my surface speed metric, to the logit model. The model’s prediction accuracy improved from 64% to 72%, its Brier score dropped slightly (a lower Brier score indicates better forecasts), and the revised model gives us a way of making surface-speed-specific forecasts for this matchup. Here are the forecasts for Federer at several surface speed ratings, from tour average (1.0) to the fastest ratings seen on the circuit:

SSR  p(Fed Wins)  
1.0        49.3%  
1.1        51.4%  
1.2        53.4%  
1.3        55.5%  
1.4        57.5%  
1.5        59.5% 

In the fifteen years since Rafa and Roger began their rivalry, the Wimbledon surface has averaged around 1.20, 20% quicker than tour average. In 2006, when they first met at SW19, it was 1.24, and in 2008, it was 1.15. Three times in the last decade it has topped 1.30, 30% faster than the average ATP surface. This year, it has dropped almost all the way to average, at 1.00, when both men’s and women’s results are taken into account.

As the table shows, such a dramatic difference in conditions has the potential to influence the outcome. On a faster surface, which we’ve seen as recently as 2014, Federer has the edge. At this year’s apparent level, the model narrowly favors Nadal. Rafa has said that the surface itself is unchanged, but that the balls have been heavier due to humidity. He should hope for another muggy day on Friday–the end result could depend on it.

Forecasting Andy Murray, Doubles Specialist

We are three weeks into the mostly-triumphant doubles comeback of Andy Murray. In his first week back, he raced to the Queen’s Club title with Feliciano Lopez. A week later, he paired Marcelo Melo and lost in the first round. At Wimbledon, he is partnering Pierre-Hugues Herbert, with whom he has already defeated the only-at-a-slam duo of Marius Copil and Ugo Humbert.

Today in the second round, Herbert/Murray face a sterner test: sixth-seeded team Nikola Mektic and Franco Skugor. The betting markets heavily favored Herbert/Murray going into the contest, but we have to assume that punters (including an unusually high number of casual ones) are probably overrating the familiar name on his home turf.

D-Lo to the rescue

Let’s see what D-Lo (Elo for doubles!) says about today’s match. D-Lo treats each team as a 50/50 mix of the two players, and adjusts each player’s rating after every match, depending on the quality of the opponent. It also very slightly regresses both partners to the team average after each match, because it’s impossible to know how much each player contributed to the result.

Herbert is D-Lo’s top doubles player in the world on hard and clay courts, though he falls to 6th in the 50/50 blend of overall and grass-specific ratings used for forecasting. Murray, thanks to his run at Queen’s, is up to 54th in the blend, though that’s really more like 40th among players in the draw, since several injured and recently-retired players are clinging to high D-Lo ratings.

Mektic and Skugor are credible specialists, as indicated by their ATP ranking. They are 24th and 26th in the D-Lo, respectively. Combined, the two teams’ ratings are quite close: 1773 for Herbert/Murray to 1763 for Mektic/Skugor. In a best-of-three match, a difference of 10 points translates to a 51.4% edge for the favorites. In best-of-five, the better team is always more likely to come out on top, though with such a small margin it barely matters. Here, the best-of-five number is 51.6%.

Versus the pack

How does a team rating of 1773 compare to the rest of the remaining field? Entering Saturday’s play, 22 men’s doubles pairs were still in the draw. As I write this, Lopez and Pablo Carreno Busta are the only additional team to have been eliminated, reducing the field to 21.

Here are the combined D-Lo ratings of these teams. The rank shown for each player is based on the 50/50 blend of overall and grass rating used for forecasting.

Team D-Lo  Rank  Player             Rank  Player             
1873       2     Mike Bryan         3     Bob Bryan          
1858       4     Lukasz Kubot       7     Marcelo Melo       
1836       9     Raven Klaasen      10    Michael Venus      
1817       8     John Peers         17    Henri Kontinen     
1802       12    Nicolas Mahut      22    E Roger-Vasselin   
1788       18    J S Cabal          19    Robert Farah       
1773       6     P H Herbert        54    Andy Murray        
1764       15    Oliver Marach      36    Jurgen Melzer      
1763       24    Nikola Mektic      26    Franco Skugor      
1757       20    Rajeev Ram         33    Joe Salisbury      
1747       23    Horia Tecau        41    Jean Julien Rojer  
1709       42    Maximo Gonzalez    46    Horacio Zeballos   
1695       29    Ivan Dodig         88    Filip Polasek      
1681       58    Marcus Daniell     62    Wesley Koolhof     
1677       50    Frederik Nielsen   77    Robin Haase        
1644       81    Marcelo Demoliner  90    Divij Sharan       
1637       84    A Ul Haq Qureshi   99    Santiago Gonzalez  
1596       106   Philipp Oswald     123   Roman Jebavy       
1575       101   Mischa Zverev      184   Nicholas Monroe    
1533             Jaume Munar        216   Cameron Norrie     
1517       177   Marcelo Arevalo    214   M Reyes Varela

Herbert/Murray rank 7th among the surviving pairs. The combined rating of 1773 makes them competitive against anyone. The 100-point difference separating them and the Bryans gives them a 33% chance of pulling off a best-of-five upset, while the 29-point gap between them and Nicolas Mahut/Edouard Roger Vasselin translates to a 45/55 proposition.

Fortunately for the French-British pair, they won’t have to play a higher-rated team for some time. If they win today, they’ll face the winner of Dodig/Polasek vs Zverev/Monroe. The first of those teams is rated 80 points lower than Herbert/Murray (64% odds for the favorites), and the second is 200 points lower (81% for the faves). The three teams that could advance to become the quarter-final opponent for Herbert/Murray are all rated lower than Dodig/Polasek.

The draw certainly favored Sir Andrew. Yes, the 1859-rated Pavic/Soares duo crashed out in their section, but even before that, three of the best teams–Bryan/Bryan, Kubot/Melo, and Mahut/Roger-Vasselin–were stuck together in another quarter. While no men’s doubles match is a sure thing, the path is clear for Herbert/Murray to reach the final four.

Beyond Wimbledon

Does Murray have what it takes to become a full-time doubles specialist? Taking his Queen’s Club title into account, his overall D-Lo is already up to 36th best on tour, just ahead of Skugor, and several places better than Roland Garros co-champ Kevin Krawietz. Jurgen Melzer, another excellent singles player making of a go of it on the doubles circuit, is ranked 20 places lower, with a D-Lo 40 points less than Murray’s.

The short answer, then, is yes. It must be noted, though, that he isn’t the best choice among the big four to have a successful post-singles career as part of a team. That honor goes overwhelmingly to Rafael Nadal. Nadal’s career peak D-Lo is 100 points higher than Murray’s, and even his grass-court rating–based, admittedly, on some old results–is 70 points higher. Aside from the injured doubles wizard Jack Sock, Nadal is the best active player absent from the Wimbledon draw.

So, Murray/Nadal, Wimbledon 2021 champions? Sounds good to me–as long as Herbert relinquishes his new partner and finally commits to focusing on singles.

A History of Wide-Open French Open Women’s Draws

For the last few years, we’ve been hearing a lot about “depth” in women’s tennis. No player has emerged as a dominant force since Serena Williams began her maternity leave after the 2017 Australian Open. On yesterday’s podcast, I argued that this year’s French Open felt particularly wide-open, especially after seeing a Rome final contested between Karolina Pliskova and Johanna Konta, two women who aren’t known for their clay-court prowess.

When the tape stopped rolling, I generated a forecast for the tournament, using surface-specific Elo ratings for a field made up of the top 128 women in the official rankings. (The makeup of the actual draw will differ, but the exact qualifiers and wild cards typically don’t affect the results very much.) Reigning champ Simona Halep comes out on top, with a 22.2% chance of defending her title. Petra Kvitova is next, just above 10%, followed by Kiki Bertens, who narrowed missed double digits.

The forecast gives two more entrants a 5% chance at the title, five more a 3% or better probability, and another nine a 1% chance. That’s a total of 19 women (see below) with at least a 1-in-100 shot, including such underdogs as Anett Kontaveit and Petra Martic. Maria Sakkari, winner in Rabat and semi-finalist in Rome, is 20th favorite, just below the 1% threshold. There isn’t much to separate the players in the bottom half of this list, and when the draw dishes out shares of good and bad fortune, the order will surely shift.

This all seems … pretty wide-open. It’s certainly a shift from the French Open of 30 years ago, when a dominant Steffi Graf entered with a 68% probability of securing the title, one of only five players with better than a 1% chance. (The tennis gods scoffed at our future retro-forecasts: Arantxa Sanchez Vicario carried her 1.5% pre-tournament odds to the championship.)

The 19-strong gang of one-percenters is, indeed, a very recent development. In the previous 30 years, the average number of players going into the tournament with 1%-or-better title odds was 11.5, and it only reached 19 three times, two of which were 2017 and 2018. (The other was 2010, with a whopping 23 one-percenters, and not a single player above a 13% chance of winning.) As recently as 2004, only eight women had reason to be so optimistic before the first balls were struck.

The second-tier group of favorites–entrants with a 1% shot at the title, but not much more–is the most distinctive feature of recent French Opens, and it lends credence to the argument that women’s tennis is particularly deep these days. You may not take the chances of 17th-seeded Kontaveit too seriously, but she is more a factor than similarly-seeded players 15 years ago.

When we narrow our focus to competitors meeting higher thresholds, like 3% or 5% title-winning probabilities, the present era looks less novel. From 1989 to 2018, the typical field included 6.5 women with 3%-or-better chances, and 4.8 women at 5% or higher. This year’s group includes ten in the first category and five–roughly the historical average–in the second. Only the army of one-percenters sets the 2019 bracket apart from, say, the 1997 field, when nine women headed to Paris with a 3% shot, seven of them at 5% or better.

What has changed is the dominance of the player at the top of the list. The average favorite of the last three decades opened with a one-in-three chance of winning, while Halep hasn’t exceeded 23% in her three years as frontrunner. Here are the ten “weakest” Roland Garros favorites from 1989 to 2019:

Year  Favorite            Fave Odds     
2010  Venus Williams          12.9%     
2018  Simona Halep            19.1%  *  
2011  Caroline Wozniacki      22.0%     
2019  Simona Halep            22.2%     
2017  Simona Halep            23.0%     
2006  Justine Henin           23.3%  *  
2005  Justine Henin           23.4%  *  
2012  Victoria Azarenka       24.1%     
2008  Maria Sharapova         24.5%     
2009  Dinara Safina           24.7%

* Favorites who went on to win

The French Open has traditionally made the women’s field look deep, even when it wasn’t particularly so. The favorite has only claimed the trophy in 8 of the last 30 tournaments, a 27% mark that would almost qualify for the above list. Sanchez Vicario twice won with sub-2% pre-tourney odds, Anastasia Myskina’s 2004 title was a 0.8% shot, and Jelena Ostapenko entered the 2017 event as 27th favorite, behind Mona Barthel and Katerina Siniakova, with a 0.4% probability of winning.

Surprises, then, have always been part of the program in Paris. Without an overwhelming force at the top of the draw with a “1” next to her name, the field has finally caught up. No individual has a particularly good chance of going on a victory tour, but a staggering array of contenders have reason to hope for a magical fortnight.

The complete list of “favorites” sorted by chance of winning: Halep, Kvitova, Bertens, Pliskova, Ashleigh Barty, Angelique Kerber, Elina Svitolina, Caroline Wozniacki, Garbine Muguruza, Naomi Osaka, Sloane Stephens, Marketa Vondrousova, Madison Keys, Konta, Serena, Kontaveit, Caroline Garcia, Victoria Azarenka, and Martic.

Forecasting Future Felix With ATP Aging Patterns

Italian translation at settesei.it

It’s been an exceptional six weeks for Felix Auger-Aliassime. He broke into the top 100 with a runner-up performance on clay in Rio de Janeiro, won two matches each at Sao Paulo and Indian Wells (including an upset of Stefanos Tsitsipas), and raced to a semi-final at the Miami Masters, the youngest player ever to make the final four of that event. Four months away from his 19th birthday, his ranking is up to 33rd in the world, and he has few points to defend until June.

Felix is the youngest man in the top 100, and he’s reaching milestones early enough to draw comparisons with some of the best young players in the sport’s history. Will he follow in the footsteps of past wunderkinds such as Rafael Nadal and Lleyton Hewitt? To answer that question, let’s take a look at typical ATP aging patterns, what they say about when players hit their peaks, and what they can show us about the fate of the best 18 year olds.

The standard curve

Last week, I looked at WTA aging curves and found that women tend to peak around age 23 or 24, an age that has not changed even as the sport has gotten older. I also discovered that there is a surprisingly modest gap–about 70 Elo points–between 18-year-old performance and a woman’s peak level. The men’s results are different.

To calculate the average ATP aging curve, I found over 700 players who were born between 1960 and 1989 and played at least 20 tour-level, tour qualifying, or challenger-level matches in each of five seasons. Overall, peak age was 25, though the difference from age 24 to 27 is only a few Elo points, so small as to be negligible.

As the tour has gotten older, the men’s peak age has also increased. Of the nearly 300 players born between 1980 and 1989, peak age is 26-27, with ages 28 and 29 also within 10 Elo points of the age 26-27 peak. Plenty of players are peaking at older ages, and many of those who aren’t are remaining close to their best levels into their late twenties. The peak age could be even higher still–a few of the players in the 1980-89 cohort turn 30 this year, and could conceivably still improve on their career bests.

The following graph shows the trajectory of the average player (with peak year-end Elo set to 1,850) born in the 1960s and the pattern of the average player born in the 1980s:

It’s a long ascent from the performance level at age 18 to the typical peak, especially for more recent players. There’s even a hefty bit of selection bias that should inflate the level of 18 year olds, since only about 10% of the players in the overall sample qualified for a year-end Elo rating when they were 18. The ones who did were, in general, the best of the bunch.

Felix forward

Through the Miami semi-final, Auger-Aliassime’s Elo rating is 1,848. The average player in the entire dataset who played at least 20 matches in their age-18 season went on to add another 281 Elo points to their rating between the end of their age-18 season and their peak. In the narrower, more recent cohort of 1980-89 births, the players with year-end ratings as 18 year olds improved their Elos by a whopping 369 points before reaching their peaks.

Adding either of those numbers to Felix’s current rating gives us quite the rosy forecast:

Cohort   Current  Increase  Proj. Peak  
1960-89     1848       281        2129  
1980-89     1848       369        2217

There’s a bit of slight of hand in how I’m doing this, since my study uses players’ year-end ratings, and I’m using Felix’s rating in April. However, there’s no natural law that says one artificial 12-month span is better than another, and Felix’s current age of 18.6 is roughly in the middle of the ages of the year-end 18-year-olds with whom I’m comparing him.

An Elo rating of 2,129 would be good enough for fourth place on the current list, behind only the big three. The rating of 2,217 is better than any of the big three can boast at the moment, and would be the fourth-best peak year-end rating among active players, again trailing only the big three. (And Andy Murray, if you consider him active.) Only 15 Open era players have managed year-end Elo peaks above 2,217.

No comparisons

It’s tough to say whether this method, of finding the typical difference between 18-year-old and peak Elo ratings, is adequate to handle the extremes. Some players peak earlier than average, and it stands to reason that the best young talents are more likely to do so. Boris Becker posted a whopping 2,212 Elo rating at the end of his age-18 season, which didn’t leave much room for improvement. He gained another 90 points before the end of his age-19 season, which was his career best.

Becker’s career path is not particularly helpful to our effort to forecast Felix’s, in part because the German was so unique, and also because his experience reflects such a different era. But even among less unique players, there are few useful comparables. No one born since 1987 managed a better age-18 Elo rating than Felix’s 1,848, and only a handful of active or recently-retired players even reached 1,750 by that age.

Lacking the data for a more precise approach, let’s repeat what I did for Bianca Andreescu last week, and see how the nearest 18-year-old comparisons fared. Of the players whose age-18 year-end Elos were closest to Felix’s 1,848, here are the 10 above him and the 10 below him on the list:

Player               BirthYr  18yo Elo  Incr  Peak Elo  
Stefan Edberg           1966      1916   350      2266  
John Mcenroe            1959      1912   496      2408  
Guillermo Coria         1982      1909   145      2055  
Pat Cash                1965      1907   151      2058  
G. Perez Roldan         1969      1884    41      1925  
Andy Murray             1987      1878   465      2343  
Roger Federer           1981      1871   487      2359  
Thomas Enqvist          1974      1865   216      2081  
Rafael Nadal            1986      1862   452      2314  
Jim Courier             1970      1849   283      2132  
…                                                       
Jimmy Brown             1965      1834     0      1834  
Andy Roddick            1982      1815   291      2106  
Aaron Krickstein        1967      1812   246      2058  
Yannick Noah            1960      1812   299      2112  
Fabrice Santoro         1972      1805    85      1890  
Andreas Vinciguerra     1981      1803    16      1819  
Novak Djokovic          1987      1792   645      2436  
Sergi Bruguera          1971      1790   265      2055  
Thomas Muster           1967      1788   329      2117  
Dominik Hrbaty          1978      1779   133      1913

The average increase among this group is 270 Elo points, close to the overall average for players who qualified for a year-end Elo rating at age 18. The youngest members of this list are encouraging: the big four, Andy Roddick, and Andreas Vinciguerra. Most promising youngsters would happily take a two-in-three shot at having a career at the level of the big four.

Perhaps the best comparison for Felix is a player who didn’t quite make that list, Alexander Zverev. The 21-year-old German posted a year-end Elo of 1,768 as an 18 year old, and already boosted that number by more than 300 points at the end of his 2018 campaign. Zverev is only an approximate comparison, he’s just a single data point, and we don’t know where he’ll end up, but his experience is a decade more recent than those of Novak Djokovic, Murray, and Nadal.

Forecasting the career performance of young tennis players is an inexact science, at best. Potential outcomes for Auger-Aliassime range from teenage flameout to double-digit major winner. Based on the limited information he’s given us so far, the latter seems within reach. What we know for sure is that he’s playing better tennis than any 18 year old we’ve seen in a decade. If that’s not reason for optimism, I don’t know what is.

Nick Kyrgios is More Predictable Than We Think

Italian translation at settesei.it

There is a persistent belief among tennis fans and commentators that some players are particularly inconsistent. For today’s purposes, I’m talking about match-to-match results, the players who have a knack for upsetting higher-ranked opponents but are also particularly susceptible to losses against weaker players. We have a range of words for this, like unpredictable, dangerous, tricky, and the preferred term for Nick Kyrgios: mercurial.

So far in 2019, Kyrgios has provided a perfect example of the inconsistent type. After early losses to Jeremy Chardy and Radu Albot, he bounced back to win last week’s ATP 500 in Acapulco, knocking out Rafael Nadal, Stan Wawrinka, John Isner, and Alexander Zverev. There’s no question that the Australian possesses more talent than his ranking would suggest. This is a guy who has yet to crack the top ten, but holds a .500 record in completed matches against the Big 3, a feat managed by no other active player (minimum 5 matches, excepting Nadal and Novak Djokovic themselves).

He sounds inconsistent. His results look unpredictable. But compared to the uncertainty that comes with every tennis match between highly-ranked professionals, how does he stack up? As my headline suggests, it’s not as clear-cut as it seems.

Measuring predictability

Consider the opposite type, a player who reliably beats lower-ranked opponents and usually loses against his betters. Roberto Bautista Agut has this type of reputation. As we’ll see, the numbers bear it out, notwithstanding his Doha upset of Djokovic a couple of months ago. If someone really is so predictable, that should show up in a comparison of his pre-match forecasts to his results. For a Bautista Agut type, the forecasts would be particularly accurate, while for a Kyrgios type, the forecasts would be much less reliable.

We already have a metric for this. Brier Score measures the accuracy of forecasts, considering not just how often predictions proved correct, but how close they came. For instance, after Kyrgios beat Zverev in Saturday’s Acapulco final, those prognosticators who gave the Aussie a 90% chance of winning were “more” correct than those who gave him a 60% shot. On the other hand, too much confidence runs the risk of a worse Brier Score–if you’re always giving tennis favorites a 90% chance of winning, you’ll often be wrong. Brier Score is the average of the squared difference between the pre-match forecast (e.g. 90%) and the result (1 or 0, depending if the pick was correct).

Brier Scores for ATP forecasting hover around the 0.2 mark. A lower Brier Score is better, representing less difference between prediction and results, so if you can come in much lower than 0.2, you should be making money betting on matches. If you’re much higher than 0.2, you might as well be flipping a coin. If we use random, 50/50 pre-match predictions, the resulting Brier Score is 0.25.

Brier-gios

If a player is truly unpredictable, the Brier Score for his matches should approach the 0.25 mark, and it should definitely exceed the tour-typical 0.2. To measure the reliability of pre-match forecasts for Kyrgios and other players, I used my surface-weighted Elo ratings for every completed tour-level main draw match since 2000 and generated percentage forecasts for each one. By this method, Zverev had a 67.4% probability of winning the Acapulco final.

So far in 2019, Kyrgios does look truly unpredictable. The Brier Score of his ten match results is 0.318, meaning that we’d have done better by simply flipping a coin to forecast the result of each of his matches. Even if we retroactively increase his chances of winning each match to account for the fact that he’s playing better than his Elo rating predicted, the Brier Score is 0.277, still worse than coin flips.

On the other hand, it’s just ten matches. Several other players have 2019 Brier Scores well over the 0.25 threshold, including Frances Tiafoe, Joao Sousa, Juan Ignacio Londero, and Felix Auger Aliassime. In a handful of tournaments, you’ll always get a few oddball results, either because of marked improvements (as is likely with Auger Aliassime) or extreme good or bad luck. Unless we’re willing to say that Sousa and Londero are remarkably unpredictable players, we shouldn’t draw the same conclusion based on Kyrgios’s last ten matches.

What you predict is what you get

The Brier Score for Elo-based forecasts of Kyrgios’s career matches at tour level is 0.219. That’s higher–and thus less predictable–than average, but not by that much. Of the 280 players with at least 100 tour-level matches this century, Kyrgios ranks 84th, more reliable than 30% of his peers. In 2017, his results were quite unpredictable, with a Brier Score of 0.244, but in 2015 and 2016 they generated a more pedestrian 0.210, and last year they looked downright predictable, at 0.177.

The Australian may be quite unpredictable in tactics, point-to-point performance, or on-court behavior, but his results just aren’t that unusual. The following table shows the 15 most unpredictable active players, as measured by Brier Score, along with Kyrgios, followed by the 15 most predictable active players:

Player                 Matches  Brier  
Lucas Pouille              189  0.247  
Andrey Rublev              106  0.245  
Benoit Paire               377  0.239  
Ivo Karlovic               650  0.239  
Stefanos Tsitsipas         100  0.232  
Karen Khachanov            154  0.231  
Peter Gojowczyk            102  0.231  
Federico Delbonis          225  0.227  
Marius Copil               108  0.227  
Damir Dzumhur              173  0.227  
Ernests Gulbis             420  0.226  
Pablo Cuevas               338  0.226  
Mischa Zverev              297  0.226  
Joao Sousa                 323  0.226  
Borna Coric                210  0.226  
...                                       
Nick Kyrgios               191  0.219  
...                                       
Matthew Ebden              171  0.188  
David Goffin               344  0.188  
Marin Cilic                684  0.186  
Richard Gasquet            770  0.183  
Tomas Berdych              911  0.182  
Milos Raonic               448  0.178  
David Ferrer              1048  0.177  
Jo Wilfried Tsonga         600  0.175  
Roberto Bautista Agut      384  0.172  
Kei Nishikori              517  0.167  
Juan Martin Del Potro      560  0.160  
Andy Murray                802  0.146  
Roger Federer             1350  0.121  
Novak Djokovic             951  0.117  
Rafael Nadal              1060  0.114 

Lucas Pouille’s results have been almost impossible to forecast. The Brier Score generated by his 2018 results was nearly 0.3, suggesting it would have been smarter to calculate a forecast and then bet against it! Ivo Karlovic also shows up among the less reliable players, though it’s not clear whether that’s due to his unusual game style. Isner, the only decent parallel we have, is as reliable as the tour in general, with a career Brier Score of 0.201. Reilly Opelka, the other towering ace machine in the ATP top 100, has defied the odds so far in 2019, but he hasn’t yet amassed enough data to draw any conclusions.

At the other end of the spectrum, the most reliable players are many of the best. That adds up: A dominant player not only wins most of the matches he should, but his performance also allows us to make more aggressive forecasts. Nadal often enters matches with a 90% or better probability of winning, and confident predictions like that–as long the player converts them into wins–are what generate the lowest Brier Scores.

Consistent consistency results

We all tend to read too much into unusual results. Kyrgios has given us plenty of those, and we’ve repaid the favor by making him out to be even more of a wild card than he is. A couple of weeks ago, I took on a similar question and found that ATPers don’t really “play their way in” to tournaments, earning better or worse results in different rounds. This isn’t quite the same issue, but it all comes back to similar truths: Existing forecasts are pretty good, there’s always going to be a lot of randomness in the results, and the stories we invent to account for the randomness don’t really explain much at all.

Kyrgios is an immensely interesting player–I joked in yesterday’s podcast that readers should prepare themselves for a ten-part series–and digging into his point-by-point stats could reveal characteristics that are unique among tour players. That is still true. But at the match level, the likelihood that his contests will end in upsets isn’t unique at all–even if he is the proud new owner of a sombrero that says otherwise.

The Best Draw That Money Can Buy

Italian translation at settesei.it

Last week featured two events on the WTA calendar. First, both chronologically and by every conceivable ranking except for “most Hungarian,” was the Dubai Open, a Premier 5 event offering over $500,000 and 900 ranking points for the winner. The other was the Hungarian Open in Budapest, a WTA International tournament with $43,000 and 280 ranking points going to the champion. No top player would seriously consider going to Budapest, even before considering potential appearance fees and WTA incentives.

Fifteen of the top twenty ranked women went to Dubai, and the top seed in Budapest, defending champ Alison Van Uytvanck, was ranked 50th. Every Budapest entrant ranked in the top 72 got a top-eight seed, including a couple of players who would have needed to play qualifying just to earn a place in the Dubai main draw.

The rewards offered by the Dubai event and supported by the structure of the WTA tour make this an easy scheduling decision for many players. But at some point, if the rest of the field is zigging toward the Gulf, might it be better to zag toward Central Europe? Van Uytvanck would have been an underdog to reach even the third round of the richer event, yet she defended her title in Budapest. Marketa Vondrousova, who would have been stuck in Dubai qualifying, reached the Hungarian Open final. Opting for the smaller stage almost definitely proved the wise choice for those two women. Did other, better-ranked players leave money or ranking points on the table?

Motivations

Scheduling decisions depend on a lot of factors. Some women might prefer to play the event with the highest-quality field, both to test themselves against the best and to give themselves an opportunity for the circuit’s richest prizes. Others might head for the marquee events because of their doubles prowess: Timea Babos was part of the top-seeded doubles team in Dubai, but was the lowest-ranked direct entry in singles. Still others might choose to play closer to home or at tournaments they’ve enjoyed in the past.

For all that, ranking points should come first, with prize money also among the top considerations. Ranking points determine one’s ability to enter future events and to remain on tour. Prize money is necessary to cover the vast expenses necessary to bankroll a traveling support staff.

Dubai-versus-Budapest offers a fairly “pure” experiment, because both are played on similar surfaces and neither event is in the middle of a mini-circuit of events in a single region. Yes, Dubai immediately follows Doha, but that trip requires a flight, and most players headed back to Europe or North America after the tournament. Opting for one event over the other doesn’t substantially complicate anyone’s travel plans, like it would for an ATPer to mix and match destinations from the South American golden swing and the simultaneous European indoor circuit.

Revealed preferences

Let’s see which of the two main factors played a bigger role in scheduling decisions last week. To determine each player’s options, I tried to reconstruct as much as possible what information each woman had at her disposal six weeks earlier, on January 7th, when entry applications and stated preferences for Dubai and Budapest were due. I used the January 7th rankings to project how a player would be seeded at either event, and Elo ratings as of that date to forecast how far she would advance in each draw.

The major difficulty of this kind of simulation is the composition of the draws themselves. From our vantage point after the events, we know who opted for each draw as well as which players were unable to compete. In early January, none but the best-connected players would have known which of her peers would head in which direction, and no one at all could have known that Caroline Wozniacki would be a late withdrawal from Dubai, or that a viral illness would knock Kirsten Flipkens out of the Hungarian Open. Still, the resulting 2019 draws were very similar to what players could have predicted based on the player fields in 2018. So to simulate each player’s options, we’ll use the fields as they turned out to be.

Let’s start with Carla Suarez Navarro, the highest-ranked woman (at the January 7th entry deadline) who wasn’t seeded in Dubai. She ended up reaching the quarter-finals at the Premier event, in part because Kristina Mladenovic did her the favor of ousting Naomi Osaka from that section of the draw. For her efforts, Suarez Navarro grabbed 190 ranking points and almost $60,000. She would have needed to win the Budapest title to garner more points. And with a champion’s purse of “only” $43,000 in Hungary, she would have needed to rob a bank to improve on her Dubai prize money check.

However, that isn’t what Suarez Navarro should have anticipated taking home from Dubai. Sure, she should be optimstic about her own potential, but smart scheduling demands some degree of realism. I ran simulations of both the Dubai tournament (before the draw was made, so she doesn’t always end up in Osaka’s quarter) and the Budapest event with the Spaniard as the top seed and the rest of the field (minus last-in Arantxa Rus) unchanged. These forecasts suggest that Suarez Navarro only had a 12% chance of reaching the Dubai quarters, and that her expected ranking points in the Gulf were much lower:

Event     Points  Prize Money  
Dubai         76     $28.121   
Budapest     111     $15.384

(prize money in thousands of USD)

In all of these simulations, I’ve calculated points and prize money as weighted averages. Suarez Navarro had a 37% chance of a first-round loss, so that’s a 37% chance of one ranking point and first-round-loser prize money. And so on, for all of the possible outcomes at each event. For the Spaniard, her expected ranking points were nearly 50% higher as the top seed in Budapest. But because the Dubai prize pot is so much larger, her expected check was almost twice as big at the tournament she chose.

Consistent incentives

The total purse in Dubai was more than eleven times bigger than the prize money on offer in Hungary, while the points differed by only a factor of three. Thus, it’s no surprise that Suarez Navarro’s incentives are representative of those faced by many more women. I ran the same simulations for 26 more players: All of the competitors who gained direct entry into Dubai but were unseeded, plus Bernarda Pera, who would have been seeded in Budapest but instead played qualifying in the Gulf.

The following table shows each player’s expected points and prize money for Dubai (D-Pts and D-Prize), along with the corresponding figures for Budapest (B-Pts and B-Prize):

Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Dominika Cibulkova           96   $36.794     130   $18.291   
Lesia Tsurenko               84   $31.528     119   $16.695   
Carla Suarez Navarro         76   $28.121     111   $15.384   
Aliaksandra Sasnovich        75   $27.920     111   $15.364   
Dayana Yastremska            72   $26.716     107   $14.803   
Anastasia Pavlyuchenkova     72   $26.590     106   $14.721   
Barbora Strycova             67   $24.809     102   $14.096   
Donna Vekic                  66   $24.143     100   $13.717   
Katerina Siniakova           63   $23.157      95   $13.062   
Ekaterina Makarova           58   $21.543      90   $12.265   
                                                              
Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Petra Martic                 57   $21.019      88   $11.960   
Su Wei Hsieh                 54   $19.863      84   $11.396   
Belinda Bencic               53   $19.813      84   $11.372   
Ajla Tomljanovic             53   $19.530      82   $11.181   
Shuai Zhang                  49   $18.350      77   $10.416   
Sofia Kenin                  46   $17.109      72    $9.659   
Ons Jabeur                   45   $17.077      71    $9.624   
Viktoria Kuzmova             45   $17.009      70    $9.432   
Alize Cornet                 44   $16.823      69    $9.280   
Saisai Zheng                 40   $15.436      62    $8.307   
                                                              
Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Vera Lapko                   37   $14.618      57    $7.695   
Mihaela Buzarnescu           36   $14.465      56    $7.548   
Alison Riske                 35   $14.309      55    $7.445   
Kristina Mladenovic          34   $13.910      51    $6.969   
Timea Babos                  32   $13.354      48    $6.572   
Yulia Putintseva             32   $13.407      48    $6.484   
Bernarda Pera*               25   $11.830      36    $5.061

Every single player could have expected more points in Budapest and more money in Dubai. The ratios are all similar to Suarez Navarro’s. The one possible expection is Pera (hence the asterisk). My simulation assumed she came through qualifying to make the main draw, and calculated only her expected points and prize money from main draw matches. Yet simply qualifying for the main draw is worth 30 ranking points, plus whatever points a player earns by winning main draw matches. Pera was no lock to qualify, but she was favored, and usually a couple of lucky loser spots make the main draw even more achieveable. It’s possible that if we ran all those scenarios, Pera is the one player for whom Dubai offered better hopes of prize money and points.

Loss aversion and game theory

It’s no accident that Van Uytvanck was one of the few players to choose the high-points, low-prize money route. She was defending 280 points from last year’s Hungarian Open, meaning that opting for a bigger check in Dubai would have a negative impact on her ranking. The thought of losing a couple hundred ranking points has a greater influence on behavior than the chance of gaining the same amount for a player who has few to defend.

For the majority of women who will face the same decision in 2020 without many points to defend, what should they do? Assuming, as I do, that they and their coaches will all carefully study this article, what happens if more top-70 players decide to chase ranking points and flock to the smaller event?

If the Budapest field gets stronger, each entrant’s expected points and prize money will decrease; if Dubai’s field weakens, each player there can anticipate a better chance of more points and even more money. As the entry system is currently structured, in which each player must state their preferences without knowledge of their peers’ choices, we can’t count on reaching an equilibrium. Even if every single player aimed solely to maximize ranking points, there wouldn’t be enough information available to reliably make the right choice. It’s conceivable, though unlikely, that a Budapest could attract a stronger field and end up offering lower expected prize money checks and ranking points.

But don’t fret, dear readers and schedule optimizers. There are external factors and there always will be. And in this case, virtually all of those factors pull players to the bigger money event. (Even Hungarian heroine Babos skipped her home tournament.) At least a half-dozen of the players listed above are doubles elites, making it likely they’ll choose the Premier event. Others–probably many others–will go where the money is, because they like money.

Even those who don’t play doubles and don’t like money will chase the biggest available pot of ranking points, not entirely unlike the way people play the lottery. The WTA offers a very limited set of opportunities to earn 900 points in a single week. You can get close to 900 points with three International championships, but there’s a finite number of weeks on the annual schedule–not to mention a limited number of matches in each player’s body! Lots of people stock up on lottery tickets despite unfavorable odds, and players will continue to enter higher-profile events even if their expected points are higher on smaller stages. The chance of a prestigious title, however slim, doesn’t show up in a purely actuarial calculation.

The success of Belinda Bencic–expected Dubai points, 53; expected Budapest points, 84; actual Dubai points, 900–will keep players chasing the big prizes. That’s good news for level-headed would-be optimizers. Those players willing to forego the skyscrapers, the shopping malls, and the prize money next year aren’t about to lose this opportunity. Budapest will almost certainly remain a better option for players who want to improve their ranking.

Dominic Thiem, Tennys Sandgren, and Playing Your Way In

Dominic Thiem is one of the best clay-court players on earth, with eight titles and a Roland Garros final to his credit. But his impressive track record wasn’t worth much last night, when he lost his opening-round match in Rio de Janeiro. The straight-set defeat to 90th-ranked Laslo Djere calls to mind other first-match failures, such as Thiem’s loss to Martin Klizan last summer in Hamburg, or his truly gobsmacking upset at the hands of 222nd-ranked Ramkumar Ramanathan on grass in Antalya two years ago.

It’s also not the first time this season that a top seed has proven unable to live up to their billing. Two weeks ago, the No. 1 seeds in three different ATP events all lost their first matches. I dug a bit deeper and discovered that top seeds underperform by a modest amount at these smaller tournaments. Rio is technically a higher-profile event, but the result is the same: An elite player at a non-mandatory event, heading home early.

You’ll hear all sorts of theories for this sort of thing. In ATP 250s, when top seeds get a bye, it’s possible that the elites are in danger because their opponents have played their way into form. At any optional events, it’s possible that the top seeds are not particularly motivated, making the trip for a quick appearance fee and nothing more. Finally, there’s the old saw that some competitors need to get used to their surroundings. In other words, they need to “play their way in” to the tournament. It’s this last theory that I’d like investigate.

Present and prepared

If a player needs time to get comfortable, we would expect him to underperform in the first round, and possibly continue playing below average to a lesser extent in the second round. The flip side of that is that the player would need to overperform in later rounds–if he didn’t, the earlier underperformance wouldn’t be below average, it would just be bad. These under- and over-performances are effects we can quantify.

Let’s start with Thiem. I went through his career results at the ATP level and broke his matches into several categories (some overlapping), like first match, second match, first match at a non-mandatory event, second-or-later match, finals, and so on. For each of those categories, I tallied up his results and compared them to expecatations (Expected Wins, or “ExpWins” in the table), based on what Elo forecasted at the time. Here are Thiem’s results:

Category     Matches  ExpWins  Wins  
1st              141     94.3    94  
1st (small)       84     52.9    54  
1st/2nd          238    151.3   151  
2nd               97     59.9    60  
2nd+             203    117.7   118  
3rd               58     34.9    35  
3rd+             106     60.7    61  
4th               32     18.5    19  
Finals            17     10.2    10

The Austrian has been almost comically predictable. In 84 non-mandatory tournaments through last week, Elo expected that he would win his first match 53 times. He won 54. In all tournaments, he has won his first match 94 times, exactly in line with the Elo estimation. In the nine categories shown here, his performances was never more than a 1.1 matches better or worse than expected. If he’s playing his way into tournaments, he’s doing it in a way that doesn’t show up in the results.

What about Tennys?

Thiem has suffered some rough early-round upsets, but over the course of his career, he’s usually ended up on the winning side. Maybe we’d do better to focus on a true feast-or-famine player, someone who more often loses his first-round encounters, but is dangerous when he advances further.

A great recent example of such a player is Tennys Sandgren. The American raced to the quarter-finals of last year’s Australian Open, reached a final in Houston, and won a title in Auckland to start the 2019 season. Other than that, he rarely turns up on the tennis fan’s radar. He acknowledged his inconsistency on a recent Thirty Love podcast, explaining from a player’s perspective why he thinks his results are so erratic. Like Thiem, he lost easily in an opening match last night, winning only four games against Reilly Opelka in Delray Beach.

Sandgren’s round-by-round results are less predictable than Thiem’s, but for an apparently extreme example of the go-big-or-go-home-early phenomenon, there’s not much support for it in the numbers. Because Sandgren has played fewer tour events than Thiem, I included his Challenger results before separating his matches into the same categories:

Category     Matches  ExpWins  Wins  
1st              124     64.7    62  
1st (small)      113     60.2    60  
1st/2nd          186     96.4    98  
2nd               62     31.7    36  
2nd+             120     60.3    63  
3rd               35     17.3    15  
4th               15      7.3     9  
Finals             8      4.2     3

The American has underperformed a bit in his first matches and beaten expectations in his second rounders, but the effect disappears after two matches are in the books. In any case, none of the over- or under-performances are even close to statistically significant. His extra first-match losses have about a one-in-three probability of happening by chance, and his bonus second-match wins would occur about one time in six. There could be something interesting going on here, but the effects are small, and it’s very likely that we’re seeing nothing more than randomness.

Positive results, anyone?

So far, we’ve investigated two players who seemed likely to over- or under-perform in certain groups of matches. Yet we found nothing. The “playing your way in” theory will surely survive this blog post, but let’s make sure there aren’t players who embody it, even if Thiem and Sandgren don’t.

I went through the same steps for the other 98 men in this week’s top 100, grouping their matches into categories, tallying up Elo-based expected wins and actual wins, and calculating the probability that their results–above or below expectations–are due to chance. The result is 1,043 player-categories, from Novak Djokovic’s finals to Pedro Sousa’s first matches. (The number of player-categories isn’t a round number because not every player has matches in every category, like 6th matches or finals.)

Of those 1,000 player-categories, only 29 meet the usual standard of statistical significance, in that there is less than a 5% chance they can be explained by randomness. A familiar example is Gael Monfils’s record in finals. Even with last week’s title in Rotterdam, his eight wins are outweighed by 21 losses. But such cases are extremely rare. Since fewer than 3% of the player-categories meet the 5% threshold, it’s wrong to say that these categories represent real trends (like, perhaps, a psychological basis for Monfils’s inability to win tournaments). When we test over one thousand groups of matches, dozens of them should look like outliers.

In other words, there’s no statistical support for the claim that certain players are more or less effective in certain rounds. It’s always possible that a very small number of guys have certain characteristics along these lines, but among the 29 player-categories with particularly unlikely results, only Monfils’s finals record fits any kind of narrative I’ve heard before. Richard Gasquet has won 120 times–11 more than expected–in first matches at non-mandatory events. That overperformance is just as unlikely as Monfils’s letdown in finals, so maybe we should be talking about how assiduously he prepares for the start of each tournament, no matter the stakes?

It’s always possible that the top men do, in fact, play their way into tournaments. But based on this evidence, it’s only the case if everyone rounds their way into form at approximately the same rate. Maybe first rounders are lower in quality than semi-finals. But if we’re interested in predicting outcomes–even Thiem’s first-round results against journeymen–we’d do better to ignore the theories. Opening matches just aren’t that unique, even for the players who think they are.

Forecasting the Davis Cup Finals

It took more than a year to decide on a new format, but barely a week to make the draw. With 12 countries qualifying for the inaugural Davis Cup Finals in home-and-away ties earlier in month, the field of 18 is set. Using the ITF’s own system to rank countries, the 18 teams were divided into three “pots,” then assigned to the six round-robin groups that will kick off the tournament this November in Madrid.

The new format sounds complicated, but as round-robin events go, it’s easy enough to understand. Each of the six round-robin groups will send a winning team to the quarter-finals. Two second-place sides will also advance to the final eight, as determined by matches won, then sets won, and so on as necessary, until John Isner and Ivo Karlovic stand back to back to determine which one is really taller. From that point, it’s an eight-team knock-out tournament.

Here are the groups, as determined by yesterday’s draw, with seeded countries indicated:

  • Group A: France (1), Serbia, Japan
  • Group B: Croatia (2), Spain, Russia
  • Group C: Argentina (3), Germany, Chile
  • Group D: Belgium (4), Australia, Colombia
  • Group E: Great Britain (5), Kazakhstan, Netherlands
  • Group F: United States (6), Italy, Canada

The ITF ranking system considers the last four years of Davis Cup results, so Spain’s brief exit from the World Group makes the seedings a bit wonky. As it turns out, not only is it a top team (Croatia) who will have to deal with early ties against the Spaniards, the entire Group B trio constitutes a group of death. Russia would be an up-and-coming squad in any format, and it is clearly the most dangerous of the six lowest-ranked sides.

Madrid to Monte Carlo

Last week, I introduced a more accurate, predictive rating system for Davis Cup, involving surface-specific Elo ratings for the players likely to compete. Those rankings put Spain at the top, Croatia second, Russia fifth, and fourth-seeded Belgium 14th in the 18-team field.

Now that we have a draw, we can use those ratings to run Monte Carlo simulations of the entire Davis Cup carnival Finals. As in my post last week, I’m estimating that singles players have a 75% chance of playing at any given opportunity and doubles players have an 85% chance. Those are just guesses–there’s no data involved in this step. Surely some teams are more fragile than others, perhaps because their stars are particularly susceptible to injury or just uninterested in the next event. I’ve excluded Andy Murray, but for the moment, I’m keeping Novak Djokovic and Alexander Zverev in the mix.

(We’re using Elo ratings for each individual player, which means the simulation is telling us what would be likely to happen if it were played today. Things will change between now and November, even if every eligible player shows up. A proper forecast that takes the time lag into account would probably give a slight boost for younger teams [whose players will have nine months to mature] and a penalty for older ones [who are more likely to be hit by injury]. And overall, it would shift all of the championship probabilities a bit toward the mean.)

Here are the results of 100,000 simulations of the draw, with percentages given for each country’s chance of winning their group, then reaching each of the knock-out rounds:

Country  Group     QF     SF      F      W  
ESP      46.1%  59.1%  41.9%  30.3%  19.3%  
FRA      54.2%  66.6%  40.6%  25.1%  14.6%  
AUS      74.5%  84.4%  46.0%  23.8%  12.1%  
USA      53.0%  65.5%  36.8%  19.7%  10.4%  
CRO      31.0%  43.0%  27.2%  17.8%   9.8%  
GER      52.5%  67.9%  39.7%  17.6%   7.7%  
RUS      22.9%  33.1%  19.5%  12.0%   6.1%  
SRB      33.0%  47.9%  24.1%  12.6%   6.0%  
GBR      66.8%  78.7%  35.9%  12.5%   4.4%  
ARG      39.7%  56.6%  28.6%  10.4%   3.8%  
ITA      24.3%  35.9%  14.6%   5.5%   2.1%  
CAN      22.7%  33.4%  13.1%   4.9%   1.8%  
JPN      12.8%  19.5%   7.2%   2.8%   0.9%  
BEL      20.3%  32.0%   8.5%   2.1%   0.6%  
NED      21.7%  35.5%   8.6%   1.7%   0.3%  
CHI       7.8%  12.9%   3.4%   0.6%   0.1%  
KAZ      11.5%  19.0%   3.2%   0.5%   0.1%  
COL       5.1%   8.9%   1.2%   0.1%   0.0%

Spain is our clear favorite, despite their path through the group of death. Five teams have a better chance of winning their group and reaching the quarters than the Spaniards do, but their chances in the single-elimination rounds make the difference. At the other extreme, Australia seems to be the biggest beneficiary of draw luck. My rankings put them sixth, and they landed in a group with Belgium (the lowest-rated seed) and Colombia (the weakest team in the field). Their good fortune makes them the most likely country to reach the final four, even if Spain and France have a better chance of advancing to the championship tie.

Less randomness, more Spain

What if we run the simulation one step earlier in the process? That is to say, ignore yesterday’s draw and see what each country’s chances were before their round-robin assignments were determined. For this simulation, we’ll keep the ITF’s seeds, so Spain is still a floater. Here’s how it looked ahead of the ceremony:

Country  Group     QF     SF      F      W  
ESP      63.0%  75.9%  52.9%  35.0%  22.6%  
FRA      56.8%  70.8%  43.9%  25.7%  14.5%  
CRO      55.5%  69.4%  42.2%  25.1%  13.5%  
USA      51.3%  65.6%  38.5%  19.8%  10.0%  
AUS      48.3%  62.9%  34.8%  17.7%   8.5%  
RUS      40.6%  53.5%  30.2%  15.8%   7.9%  
SRB      42.9%  55.8%  28.3%  13.5%   5.9%  
GER      42.0%  55.7%  27.3%  12.5%   5.4%  
ARG      35.9%  49.1%  20.9%   7.9%   2.8%  
ITA      33.6%  47.1%  19.2%   7.2%   2.5%  
GBR      34.9%  48.3%  20.3%   7.5%   2.5%  
CAN      24.5%  35.5%  14.3%   5.3%   1.9%  
JPN      19.8%  29.4%  10.6%   3.6%   1.1%  
BEL      20.9%  30.4%   7.5%   1.8%   0.4%  
NED       9.5%  15.5%   3.5%   0.7%   0.1%  
CHI       7.9%  13.3%   2.6%   0.4%   0.1%  
KAZ       8.4%  14.1%   2.1%   0.3%   0.0%  
COL       4.3%   7.5%   1.1%   0.2%   0.0%

With the “group of death” out of the picture, Croatia jumps from fifth to third, swapping places with Australia. The defending champs lost the most from the draw, while Spain suffered a bit as well.

Elo in charge

Another variation is to ignore the ITF rankings and generate the entire draw based on my Elo-based ratings. In this case, the top six seeds would be Spain, Croatia, France, USA, Russia, and Australia, in that order. Argentina and Great Britain would fall to the middle group, and Belgium would drop to the bottom third. Here’s how that simulation looks:

Country  Group     QF     SF      F      W  
ESP      71.6%  82.8%  57.3%  38.0%  24.1%  
FRA      64.6%  77.6%  45.8%  26.7%  14.4%  
CRO      63.1%  76.3%  45.8%  25.6%  13.6%  
USA      59.7%  73.3%  41.1%  20.2%  10.2%  
RUS      58.6%  71.2%  37.0%  19.7%   9.5%  
AUS      57.7%  71.4%  37.7%  17.7%   8.8%  
SRB      37.1%  53.0%  26.1%  12.1%   5.3%  
GER      35.3%  52.3%  24.5%  10.9%   4.6%  
ARG      28.0%  44.2%  17.5%   6.4%   2.2%  
ITA      27.4%  43.6%  16.9%   6.2%   2.1%  
GBR      27.0%  43.1%  16.5%   6.0%   2.0%  
CAN      26.7%  41.8%  16.0%   5.8%   2.0%  
JPN      15.9%  23.6%   8.1%   2.6%   0.8%  
BEL       9.4%  15.1%   3.9%   0.9%   0.2%  
NED       6.5%  10.8%   2.3%   0.5%   0.1%  
CHI       5.3%   9.0%   1.8%   0.3%   0.1%  
KAZ       3.2%   5.8%   0.9%   0.1%   0.0%  
COL       3.1%   5.2%   0.8%   0.1%   0.0%

The big winners in the Elo scenario are the Russians, who gain a seed and avoid a round-robin encounter with either Spain or Croatia. Australia gets a seed as well, but the benefit of protection from the powerhouses isn’t as valuable as the luck than shone on the Aussies in the actual draw.

Imagine a world with no rankings

Finally, let’s see what happens if we ignore the rankings altogether. It would be unusual for the tournament to take such an approach, but if there’s ever a time to have a tennis event with no seedings, this is it. The existing rankings are far too dependent on years-old results, leaving young teams at a disadvantage. And my system, while more accurate, doesn’t quite feel appropriate either. It is based on individual player ratings, and this is a team event.

Whatever the likelihood of a ranking-free draw in the Davis Cup future, here’s what a simulation looks like with completely random assignment of nations into round-robin groups:

Country  Group     QF     SF      F      W  
ESP      62.8%  75.4%  52.4%  34.8%  22.5%  
FRA      54.8%  68.6%  42.6%  25.0%  13.9%  
CRO      53.4%  67.2%  41.0%  23.6%  13.0%  
USA      48.8%  62.9%  35.9%  19.1%   9.7%  
RUS      47.9%  61.0%  34.8%  18.5%   9.3%  
AUS      47.1%  61.1%  34.1%  17.6%   8.5%  
SRB      41.5%  54.3%  28.0%  13.5%   6.1%  
GER      40.3%  53.6%  26.7%  12.3%   5.3%  
ARG      31.9%  44.9%  18.8%   7.2%   2.6%  
ITA      31.5%  44.2%  18.6%   7.1%   2.5%  
GBR      30.7%  43.4%  17.6%   6.5%   2.3%  
CAN      30.4%  42.7%  17.4%   6.4%   2.2%  
JPN      25.9%  36.4%  13.5%   4.6%   1.4%  
BEL      17.2%  25.9%   7.2%   1.8%   0.4%  
NED      12.5%  20.0%   4.6%   0.9%   0.2%  
CHI      10.4%  16.9%   3.5%   0.6%   0.1%  
KAZ       7.0%  11.8%   1.9%   0.3%   0.0%  
COL       5.9%   9.7%   1.5%   0.2%   0.0%

Round-robin formats do a decent job of surfacing the best teams, so the fully random approach doesn’t give us wildly different results than the seeded simulations. The main effect of the no-seed version is to give the weakest sides a slightly better chance at advancing past the group stage, since there is a better chance for them to avoid strong round-robin competition.

Madrid or Maldives redux

Some top players are likely to skip the event. Zverev has said he’ll be in the Maldives, and Djokovic has hinted he may miss the tournament as well. The new three-rubber format means that teams will suffer a bit less from the absence of a singles star, assuming he also isn’t one of the best doubles options as well. Still, both Germany and Serbia would much rather head to the party with a top-three singles player on their side.

Here are the results of the intial simulation–based on the actual draw–but without Djokovic or Zverev:

Country  Group     QF     SF      F      W  
ESP      46.5%  59.5%  44.0%  33.2%  21.3%  
FRA      68.2%  79.3%  49.6%  30.6%  17.8%  
AUS      74.3%  84.5%  46.1%  24.2%  12.6%  
USA      53.4%  66.2%  37.5%  20.4%  10.8%  
CRO      30.3%  42.5%  28.4%  19.6%  10.8%  
RUS      23.2%  33.6%  21.1%  13.8%   7.0%  
GBR      67.0%  79.0%  40.9%  14.6%   5.2%  
ARG      52.1%  66.9%  35.5%  12.9%   4.9%  
GER      36.4%  52.3%  23.3%   7.2%   2.2%  
ITA      24.2%  35.9%  14.5%   5.7%   2.2%  
CAN      22.4%  33.2%  13.4%   5.2%   2.0%  
JPN      19.4%  31.7%  11.5%   4.8%   1.6%  
BEL      20.5%  32.4%   8.6%   2.3%   0.6%  
SRB      12.4%  21.1%   6.0%   1.9%   0.5%  
NED      21.6%  35.5%   9.8%   2.0%   0.4%  
CHI      11.4%  18.5%   4.9%   0.9%   0.2%  
KAZ      11.3%  19.1%   3.8%   0.5%   0.1%  
COL       5.2%   9.0%   1.2%   0.2%   0.0%

Germany’s chances of winning the inaugural Pique Cup would fall from 7.7% to 2.2%, and Serbia’s odds drop from 6.0% to 0.5%. Argentina and France, the seeded teams sharing groups with Germany and Serbia, respectively, would be the biggest gainers from such high-profile absences.

Anybody’s game

I’ve been skeptical of the new Davis Cup, and while I remain unconvinced that it’s an improvement, I find myself getting excited for the weeklong tennis hootenanny in Madrid. These simulations were even more encouraging. As always, the ranking and seeding isn’t the way I’d do it, but in this format, the differences are minimal. The event format will give us a chance to see plenty of tennis from every qualifying nation, and the high level of competition from most of these countries ensures that most teams have a shot at going all the way.

Top Seed Upsets in ATP 250s

Italian translation at settesei.it

In a typical week, no one would notice if Fabio Fognini, Karen Khachanov, and Lucas Pouille combined to go 0-3. This week is different, as those three men held the top seeds at the ATP events in Cordoba, Sofia, and Montpellier. After their first-round byes, each of them lost in the second round, to Aljaz Bedene, Matteo Berrettini, and Marcos Baghdatis, respectively. At least two of the top seeds pushed their opponents to three sets, while Fognini lasted only 71 minutes.

This is not the first time a trio of number one seeds have suffered first-match upsets in the same week. Amazingly, it’s not even the first such occurrence in this very week on the calendar. Two years ago, when the South American event was played in Quito, the results were the same: top seeds Marin Cilic, Ivo Karlovic, and Dominic Thiem all failed to win a match. Thiem’s vanquisher, Nikoloz Basilashvili, even extended the streak the following week, heading to Memphis and handing Karlovic his second straight second-round ouster.

Predictable upsets?

Focusing on these losses, it’s natural to wonder whether top seeds are particularly fragile in this sort of tournament. There’s certainly a logic to it. The number one seed at an ATP 250 is usually ranked in the top 20, and is the sort of player who might have considered taking the week off. He knows that more ranking points are available at slams and Masters, so winning a smaller event isn’t his highest priority. His opponent, on the other hand, is competing every chance he gets, and the points on offer at a smaller event could make a big difference in his standing. Further, he has already played–and won–his first-round match, so he might be performing better than usual, or the conditions might suit him particularly well.

Let’s put it to the test. Since 2010, not counting this week’s carnage, I found 267 non-Masters events at which a top seed got a first-round bye and completed his second-round match. (Additionally, there have been three retirements and one withdrawal; only one of those resulted in a loss for the top seed.) The number one seeds had a median rank of 10, and the underdogs had a median rank of 89. Based on my surface-weighted Elo ratings at the time of each match, the favorites should have won 81.5% of the time. That’s better than this week’s trio of top-seeded losers, who were 64% (Fognini), 80% (Khachanov), and 69% (Pouille) favorites.

As it happened, the unseeded challengers were more successful than expected. The favorites won only 76.8% of those matches–a rate low enough that there is only a 3% probability it is due to chance alone. It’s not an overwhelming effect–certainly not enough that we should have predicted this week’s results–but it seems that a few of the top seeds are showing up unmotivated and a handful of the underdogs are playing better than expected.

Riding the wave

What about the underdog winners? Once they’ve defeated the top seed, how many capitalize on the opportunity? Berrettini came back to beat Fernando Verdasco in his quarter-final match today, while Baghdatis and Bedene play later. My forecasts believe that, of the three, Bedene has the best chance of claiming a title, though still less than a one-in-five shot at doing so.

In our subset of 267 matches, the underdog won 66 of them. More than half the time, though, that was the end of the run. 38 of the 66 (58%) fell in the quarter-finals. Another 17 lost in the semis. Whatever works so well for these underdogs in the second round disappears afterward. In the 105 matches contested by these 66 men in the quarter-finals and beyond, Elo thinks they should have won 44.9% of them. Instead, they managed only 42.3%.

There’s still a bit of hope. Five men knocked out the top seed in the second round and went on to win the entire tournament. One of those was a challenger we’ve already mentioned: Estrella, who knocked out Karlovic and went on to hoist the trophy in Quito two years ago. Maybe there’s some magic in week six. This week’s trio of underdogs would surely love to think so.

Picking Favorites With Better Davis Cup Rankings

Yesterday, the ITF announced the seedings for the first new-look Davis Cup Finals, to be held in Madrid this November. The 18-country field was completed by the 12 home-and-way ties contested last weekend. Those 12 winners will join France, Croatia, Spain, and USA (last year’s semi-finalists) along with the two wild cards, recent champions Argentina and Great Britain.

The six nations who skipped the qualifying round will make up five of the top six seeds. (Spain is 7th, while Belgium, who had to qualify, is 4th.) The preliminary round of the November event will feature six round-robin groups of three, each consisting of one top-six seed, a second country ranked 7-12, and a third ranked 13-18. Seeding really matters, as a top position (deserved or not!) guarantees that a side will avoid dangerous opponents like last year’s finalists France and Croatia. Even the difference between 12 and 13 could prove decisive, as a 7-through-12 spot ensures that a nation will steer clear of the always-strong Spaniards, who are seeded 7th.

The seeds are based on the Davis Cup’s ranking system, which relies entirely on previous Davis Cup results. While the formula is long-winded, the concept is simple: A country gets more points for advancing further each season, and recent years are worth the most. The last four years of competition are taken into consideration. It’s not how I would do it, but the results aren’t bad. Four or five of the top six seeds will field strong sides, and one of the exceptions–Great Britain–would have done so had Andy Murray’s hip cooperated. Spain is obviously misranked, but given the limitations of the Davis Cup ranking system, it’s understandable, as the 2011 champions spent 2015 and 2016 languishing outside the World Group.

We can do better

The Davis Cup rankings have several flaws. First, they rely heavily on a lot of old results. If we’re interested in how teams will compete in November, it doesn’t matter how well a side fared three or four years ago, especially if some of their best players are no longer in the mix. Second, they don’t reflect the change in format. Until last year, doubles represented one rubber in a best-of-five-match tie. A good doubles pair helped, but it wasn’t particularly necessary. Now, there are only two singles matches alongside the doubles rubber. The quality of a nation’s doubles team is more important than it used to be.

Let’s see what happens to the rankings when we generate a more forward-looking rating system. Using singles and doubles Elo, I’m going to make a few assumptions:

  • Each country’s top two singles players have a 75% chance of participating (due to the possibility of injury, fatigue, or indifference), and if either one doesn’t take part, the country’s third-best player will replace him.
  • Same idea for doubles, but the top two doubles players have an 85% chance of showing up, to be replaced by the third-best doubles player if necessary.
  • The three matches are equally important. (This isn’t technically true–the third match is likely to be necessary less than half the time, though when it does decide the tie, it is twice as important as the other two matches.)
  • Andy Murray won’t play.

Those assumptions allow us to combine the singles and doubles Elo ratings of the best players of each nation. The result is a weighted rating for each side, one that has a lot of bones to pick with the official Davis Cup rankings.

Forward-looking rankings

The following table shows the 18 countries at the Davis Cup finals along with the 12 losing qualifiers. For each team, I’ve listed their Davis Cup ranking, and their finals seed (if applicable). To demonstrate my results, I’ve shown each nation’s weighted Elo rank and rating and their hard-court Elo rank and rating. The table is sorted by hard-court Elo:

Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
ESP            7     7         1  1936          1  1891  
CRO            2     2         2  1898          2  1849  
FRA            1     1         3  1880          3  1845  
USA            6     6         4  1876          4  1835  
RUS           21    17         7  1855          5  1827  
AUS            9     9         5  1857          6  1820  
SRB            8     8         8  1849          7  1808  
GER           11    11         6  1855          8  1799  
AUT           16              10  1800          9  1766  
ARG            3     3         9  1803         10  1755  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
GBR            5     5        11  1796         11  1750  
SUI           24              14  1763         12  1749  
ITA           10    10        12  1780         13  1745  
CAN           14    13        13  1777         14  1744  
JPN           17    14        15  1735         15  1719  
BEL            4     4        17  1688         16  1673  
CZE           13              16  1712         17  1661  
NED           19    16        18  1685         18  1643  
BRA           28              20  1659         19  1638  
IND           20              21  1652         20  1621  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
SVK           29              22  1645         21  1617  
CHI           22    18        19  1682         22  1609  
KAZ           12    12        26  1582         23  1574  
COL           18    15        24  1597         24  1551  
SWE           15              27  1570         25  1542  
BIH           27              28  1552         26  1540  
POR           26              23  1610         27  1535  
HUN           23              25  1583         28  1533  
UZB           25              29  1491         29  1489  
CHN           30              30  1468         30  1465

Spain is the comfortable favorite, regardless of whether we look at overall Elo or hard-court Elo. When the draw is conducted, we’ll see which top-six seed is unlucky enough to end up with the Spaniards in their group, and whether the hosts will remain the favorite.

The biggest mismatch between the Davis Cup rankings and my Elo-based approach is in our assessment of the Russian squad. Daniil Medvedev is up to sixth in my singles Elo ratings, with Karen Khachanov at 10th. Those ratings might be a little aggressive, but as it stands, Russia is the only player with two top-ten Elo singles players. Spain is close, with Rafael Nadal ranked 2nd and Roberto Bautista Agut 11th, and the hosts have the additional advantage of a deep reservoir of doubles talent from which to choose.

In the opposite direction, my rankings do not forecast good things for the Belgians. David Goffin has fallen out of the Elo top 20, and there are no superstar doubles players to pick up the slack. In a just world, Spain and Belgium will land in the same round-robin group–preferably one without the Russians as well.

Madrid or Maldives

The results I’ve shown assume that every top singles player has the same chance of participating. That’s certainly not the case, with high-profile stars like Alexander Zverev telling the press that they’ll be spending the week on holiday in the Maldives. Some teams are heavily dependent on one singles player who could make or break their chances with a decision or an injury.

As it stands, Germany is 8th in the surface-weighted Elo. If we take Zverev entirely out of the mix, they drop to a tie for 14th with Japan. It’s something the German side would prefer to avoid, but it’s not catastrophic, partly because the Germans were never among the favorites, and partly because Zverev could play only one singles rubber per tie and the doubles replacements are competent.

Even more reliant on a single player is the Serbian side, which qualified last weekend without the help of their most dangerous threat, Novak Djokovic. With Djokovic, the Serbs rank 7th–a case where my surface Elo ratings almost agree with the official rankings. But without the 15-time major winner, the Serbs fall down to a tie with Belgium in 16th place. While the Serbs are unlikely to take home the trophy regardless, Novak would make a huge difference.

The draw will take place next Thursday. We’ll check back then to see which sides have the best forecasts, nine months out from the showdown in Madrid.