Matteo Berrettini and Best-of-Five Puzzles

Matteo Berrettini is a man you don’t want to face in best-of-five. (Unless you’re Holger Rune, in which case you beat him in four sets yesterday.)

Some players seem to play better in best-of-five matches. Maybe they are fitter than average, or they take time to get into the rhythm of a match, or they are particularly good at managing their preparation to peak for grand slams. There are many plausible explanations. I suspect most fans assume that there’s some kind of “best-of-five” factor that makes certain players better or worse than usual at the majors.

I took a first crack at the question during the 2014 Australian Open. Then, Jo-Wilfried Tsonga stood out as a best-of-five specialist. With more data and better tools, it’s time to try again. Gill Gross and Alex Gruskin teed it up:

Challenge accepted!

Note that the stat here is awfully specific: hard-court matches since 2019. Matteo also lost to Holger Rune in the second round yesterday, so the numbers have slightly changed. We’ll come back to this particular puzzle in a bit.

First, it’s important to remember that many players should have better records in best-of-five than in best-of-three. It’s the same reason that in other sports, best-of-five (or best-of-seven) series are more likely to go to the favorite than more luck-bound best-of-threes. Given more sets, players have more time to turn things around. The stronger of the two competitors is more likely to do so.

A fantastic illustration of this comes, appropriately enough, from the (all-surface) career numbers of Berrettini himself:

Format      Set%   Win%  
Best of 5  62.1%  68.9%  
Best of 3  62.0%  63.8% 

The Italian wins sets at almost exactly the same rate, regardless of format. But the match win percentage is different. There’s still something to be explained: A player who wins 63.8% of best-of-three matches should, all else equal, win approximately 67% of best-of-five matches. Matteo has done better than that, but the format alone explains much of the gap.

Better at best of five?

Identifying players who outperform expectations requires that we define “expectations.” As usual, Elo makes it easy to do this. For every individual match, we can use the Elo ratings of the two men to generate probabilities that each will win.

For Berrettini at hard-court slams since 2019–excluding retirements and the 2025 Australian Open–I have him at 27-9, good for a 75% win rate. Based on pre-match Elo ratings, though, he “should” have gone just 22-14 (technically, 22.5-13.5), taking 62% of the contests.

That’s noteworthy but not astonishing: A player expected to win 62% of the time has about an 8% chance of winning at least 27 of 36 matches. Given the number of ATP tour regulars, it stands to reason that somebody would post a stat like that. We can’t cast aside the Italian’s case yet, because we haven’t talked about his best-of-three results. Again, I’m going to kick it down the page because I want to show you some other numbers first.

Before looking at the narrow set of 2019-24 hard-court numbers, let’s see how everybody fared at grand slams, on all surfaces, since 2000. Out of 154 men with at least 50 grand slam matches this century, here are the top dozen overperformers:

Player               W-L     W%   Exp%  Ratio  
Pablo Andujar      23-39  37.1%  27.9%   1.33  
Denis Istomin      34-41  45.3%  34.7%   1.31  
Frances Tiafoe     45-34  57.0%  46.2%   1.23  
Mario Ancic        40-20  66.7%  54.4%   1.23  
Victor Hanescu     29-35  45.3%  38.6%   1.17  
Karen Khachanov    59-30  66.3%  56.6%   1.17  
Simone Bolelli     25-32  43.9%  37.6%   1.17  
Leonardo Mayer     33-38  46.5%  40.1%   1.16  
Marat Safin        79-32  71.2%  61.5%   1.16  
Nick Kyrgios       54-28  65.9%  57.0%   1.16  
Bernard Tomic      40-35  53.3%  46.7%   1.14  
Matteo Berrettini  49-20  71.0%  62.6%   1.13

The first percentage is actual win percentage, followed by expected win rate (based on Elo ratings). The ‘Ratio’ column is simply the ratio of actual to expected. These are the guys who have played better at slams than their track records would have implied.

This ratio starts to identify overperformers, but we can go one step further. Sample size really counts here. It’s one thing to win seven of ten matches when you’re expected to win five. It’s wildly different to win 70 of 100 when you’re expected to win 50. The odds of the first are 17%, while the chances of the second are a fraction of one percent.

Since this next metric accounts for sample size, I’ve expanded our view to the 334 men with at least 20 slam matches since 2000. Here are the twenty players who have most defied the odds with their overperformance in best-of-five:

Player                   Record     W%   Exp%  Ratio  Odds  
Novak Djokovic           364-45  89.0%  84.3%   1.06  0.4%  
Rafael Nadal             304-41  88.1%  82.9%   1.06  0.5%  
Tennys Sandgren           16-17  48.5%  27.6%   1.76  0.9%  
Marin Cilic              133-56  70.4%  62.4%   1.13  1.4%  
Stan Wawrinka            151-66  69.6%  62.4%   1.12  1.6%  
Marat Safin               79-32  71.2%  61.5%   1.16  2.2%  
Frances Tiafoe            45-34  57.0%  46.2%   1.23  3.5%  
Mario Ancic               40-20  66.7%  54.4%   1.23  3.6%  
Denis Istomin             34-41  45.3%  34.7%   1.31  3.7%  
Jo-Wilfried Tsonga       120-43  73.6%  66.8%   1.10  3.7%  
Karen Khachanov           59-30  66.3%  56.6%   1.17  4.0%  
Carlos Alcaraz            57-10  85.1%  76.5%   1.11  6.1%  
Nick Kyrgios              54-28  65.9%  57.0%   1.16  6.4%  
Tomas Martin Etcheverry   12-12  50.0%  33.1%   1.51  6.4%  
Lukasz Kubot              20-20  50.0%  37.3%   1.34  6.9%  
Pablo Andujar             23-39  37.1%  27.9%   1.33  7.3%  
Andrey Kuznetsov          18-21  46.2%  33.8%   1.37  7.4%  
Thomas Fabbiano           10-13  43.5%  27.7%   1.57  7.6%  
Matteo Berrettini         49-20  71.0%  62.6%   1.13  9.2%  
Joachim Johansson          15-8  65.2%  49.4%   1.32  9.5%

I’ll admit it, I mostly lowered the match minimum so that we could have a top five consisting of four slam winners and one Tennys Sandgren. Djokovic and Nadal don’t stand out in the “Ratio” category: They were expected to wins lots of matches, and they did. But not only that, they slightly exceeded expectations for a very, very long time. There’s only a one-in-two-hundred chance that a player expected to win 83% of matches would win 88% over such a long stretch.

Enter the skeptic

Even highlighting these outlier performances–many of them in the hands of players we’d expect to see on the list–it’s not clear whether there’s really a best-of-five factor. As noted, we’re working with a population of over 300 players. Three of them gave us one-in-one-hundred performances. Fewer than 10% turned in one-in-ten performances. Isn’t that what we’d expect?

This isn’t a laboratory: We can’t run tests on Novak Djokovic to see if he would keep winning at the same rate in his next 410 best-of-five matches. We certainly can’t do it 100 times to be sure. We can, however, wring a bit more from the data we have.

If there is a special best-of-five skill–above and beyond a player’s general tennis ability–we’d expect players to show it with some consistency. (If they didn’t, could we call it a skill?) Here are two tests to check whether it’s a skill:

  1. Career halves: Split each player’s list of best-of-five matches into halves. Tommy Paul, for instance, went 13-13 in his first 26 best-of-five matches–worse than expected. Since then, he’s won 19 of 27–better than expected. If there’s a best-of-five skill, we’d expect those numbers to be persistent. Sometimes they are, but in general, they are not. Statistically, there’s virtually zero correlation.
  2. Odd and even matches: Maybe career halves are the wrong way to do it: Players improve and tendencies change with age. Instead, take each player’s list of best-of-fives and put them in two buckets, one for the first, third, fifth, etc. matches on the list, the other for the second, fourth, etc. Different tack, same results: no reliable relationship.

To be clear, this doesn’t tell us that everyone’s results are a luck-driven mirage, or that no one has any noteworthy best-of-five skill. Across 350 or 400 matches, I’d bet that Djokovic and Nadal probably do. (Heuristic: If a trait is good, they probably have it.) But in general, a player who is winning more best-of-fives than expected is probably due for a correction. There’s no basis to expect the trend to continue.

The Berrettini double

With that bucket of cold water thrown on our dreams, let’s return to the head-scratcher we started with.

Hard-court matches since 2019. Here are the best-of-five overperformers, minimum 20 slam matches:

Player               Record     W%   Exp%  Ratio   Odds  
Frances Tiafoe        27-12  69.2%  49.6%   1.40   1.0%  
Matteo Berrettini      27-9  75.0%  62.4%   1.20   8.1%  
Adrian Mannarino      15-12  55.6%  41.1%   1.35   9.3%  
Alexei Popyrin        13-11  54.2%  39.9%   1.36  11.2%  
Taylor Fritz          27-12  69.2%  58.6%   1.18  11.6%  
Novak Djokovic         52-4  92.9%  87.2%   1.07  13.9%  
Rafael Nadal           31-5  86.1%  79.7%   1.08  23.4%  
Daniil Medvedev       56-11  83.6%  79.5%   1.05  25.6%  
Pablo Carreno Busta    21-9  70.0%  63.1%   1.11  27.9%  
Marin Cilic            17-8  68.0%  60.9%   1.12  30.7%

Holy Tiafoe! Elo would have predicted a 50% win rate, and instead he went 27-12. Berrettini comes next, but by this metric, it’s a distant second. Only Tiafoe really stands out in this sample.

But wait–there’s more to the Gross/Gruskin puzzle. The Italian has not only overperformed at slams, he has notably underperformed on hard courts elsewhere. Excluding Challengers, retirements, and Davis Cup, Berrettini’s record is even more mediocre than the one listed above: It works out to 42-42. Elo would have predicted a 58% win percentage, not a mere break-even rate.

Of the 35 players with at least 20 hard-court slam matches in this span, only David Goffin more severely underperformed in best-of-three. Only Goffin, Jannik Sinner, and Gael Monfils posted more unexpected numbers in best-of-three. Berrettini is as odd in best-of-three as he is in the longer format, just in the opposite direction.

Using the “Ratio” numbers, we can compare best-of-five over- (or under-) performance with best-of-three, for a kind of “super-ratio.” While Matteo is unique is the unexpectedness of his two numbers, Tiafoe still comes out ahead:

Player             bo5 Ratio  bo3 Ratio  bo5/bo3  
Frances Tiafoe          1.40       0.98     1.42  
Matteo Berrettini       1.20       0.86     1.40  
Adrian Mannarino        1.35       0.98     1.38  
Alexei Popyrin          1.36       1.07     1.27  
Marin Cilic             1.12       0.91     1.23  
David Goffin            1.04       0.86     1.21  
Daniel Evans            1.12       0.97     1.15  
Dominic Thiem           1.10       0.97     1.13  
Davidovich Fokina       1.14       1.01     1.13  
Taylor Fritz            1.18       1.05     1.13

The odds that Berrettini would give us such an unusual pair of stats are 0.3915%. Tiafoe’s number is 0.3969%. Let’s call it a tie.

Are we there yet?

After all this, I’m not sure that I’ve “explained” what’s going on here, per Gruskin’s request. We’ve seen that where best-of-five results differ from a player’s overall results, it’s mostly luck. I assume it’s the same with best-of-three. Maybe there are some additional factors in Berrettini’s case: Perhaps he’s more likely to play non-slams when he’s physically less than 100%.

There’s also this:

Tiebreak records are definitely luck-driven. These splits account for much of the difference in Matteo’s match-level results. A few points here or there, and we wouldn’t be having this conversation. Or, more likely, we’d be overreacting to unexpected numbers from somebody else.

Poor Hubi

One last thought. We’ve looked only at overperformers so far. Of course, there will always be underperformers as well. Hubert Hurkacz has disappointed a bit at slams: 34-25 before the Australian Open, compared to an Elo-expected 37-22. The subset of hard-court slams since 2019, where you’d expect the big-serving Pole to excel, has been far worse.

In a dozen majors, Hurkacz has gone 14-12, a 54% win rate compared to an expected mark of 73%. The odds of such a wide gap are 0.9%, slightly more extreme than Tiafoe’s happier results. In the same span, Hubi has outperformed in best-of-three, winning 63% of those matches instead of 58.5%. He is the anti-Berrettini.

We’ve learned today that outlying best-of-five records are probably not predictive of future results. For a statistician, such findings can be a bit disappointing. For Polish fans, though, it’s reason to rejoice. Hurkacz didn’t turn things around in Australia, winning one match and losing his second. Still, a correction remains in the cards. If apparent best-of-five specialists like Berrettini and Tiafoe can lose in the second round, a laggard like Hurkacz could–eventually–give us a deep run.

* * *

Subscribe to the blog to receive each new post by email:

 

The 52-Week Ranking Forecast

A healthy Karolina Muchova is a top-tenner. Credit: Hameltion

What will the men’s and women’s ranking lists look like at the end of the 2025 season? A few days ago, I attempted to predict which players would crack the top 100. Today, we’re playing for bigger stakes: The names at the top the table.

As with the top-100-breakthrough forecast, the most important inputs are current Elo rank and current ATP or WTA rank. Elo tells us how well someone is playing, and the official ranking tells us how well that translated into points. After all, ranking points are what will determine the list in a year’s time, too.

The cumulative ATP and WTA rankings reflect whether a player missed time in the previous year; while that isn’t always indicative of whether he or she will be absent again, injuries often recur and some pros have a hard time staying on court. The official ranking also gives some players a head start over others: The 32nd seed at the Australian Open is more likely to reach the second week than the best unseeded player, even if they have roughly the same skill level.

Age is crucial, as well. The younger the player, the more we expect him or her to improve over the course of the year. Later than the mid-20s, however, results trend (usually!) in the other direction.

I tested the usefulness of myriad other variables, including height, handedness, and surface preference. None unambiguously improved the model. I ended up using just one more input: last year’s Elo rank. Current ranks have more predictive value, but last year’s position helps, as it offers a clue as to whether a player’s current level is sustainable.

Enough chatter–let’s start with the forecast for the 2025 year-end women’s rankings:

YE 25    Player                     Age  YE 24  Elo 24  Elo 23  
1        Aryna Sabalenka           26.7      1       1       3  
2        Iga Swiatek               23.6      2       2       1  
3        Coco Gauff                20.8      3       3       2  
4        Qinwen Zheng              22.2      5       4       8  
5        Elena Rybakina            25.5      6       6       5  
6        Jasmine Paolini           29.0      4       9      28  
7        Jessica Pegula            30.9      7       8       4  
8        Paula Badosa              27.1     12       5      24  
9        Emma Navarro              23.6      8      16      53  
10       Mirra Andreeva            17.7     16      15      26  
11       Diana Shnaider            20.7     13      12     100  
12       Daria Kasatkina           27.7      9      19      16  
13       Karolina Muchova          28.4     22       7       6  
14       Barbora Krejcikova        29.0     10      22      14  
15       Marta Kostyuk             22.5     18      20      38  
16       Anna Kalinskaya           26.1     14      23      31  
17       Madison Keys              29.9     21      11      12  
18       Beatriz Haddad Maia       28.6     17      17      18  
19       Jelena Ostapenko          27.6     15      29      13  
20       Marketa Vondrousova       25.5     39      10       9  
21       Danielle Collins          31.0     11      31      22  
22       Linda Noskova             20.1     26      35      42  
23       Donna Vekic               28.5     19      27      41  
24       Liudmila Samsonova        26.1     27      26      11  
25       Leylah Fernandez          22.3     31      30      20  
                                                                
YE 2025  Player                     Age  YE 24  Elo 24  Elo 23  
26       Victoria Azarenka         35.4     20      13      29  
27       Elina Svitolina           30.3     23      24      19  
28       Ons Jabeur                30.3     42      14       7  
29       Maria Sakkari             29.4     32      21      15  
30       Katie Boulter             28.4     24      33      62  
31       Amanda Anisimova          23.3     36      28       
32       Anastasia Potapova        23.8     35      36      36  
33       Emma Raducanu             22.1     56      18       
34       Yulia Putintseva          30.0     29      25      55  
35       Magdalena Frech           27.0     25      51      85  
36       Elise Mertens             29.1     34      37      33  
37       Xin Yu Wang               23.3     37      59      57  
38       Ekaterina Alexandrova     30.1     28      48      25  
39       Anastasia Pavlyuchenkova  33.5     30      32      35  
40       Marie Bouzkova            26.4     44      44      30  
41       Elina Avanesyan           22.3     43      60     131  
42       Lulu Sun                  23.7     40      56     182  
43       Peyton Stearns            23.2     47      53     113  
44       Katerina Siniakova        28.6     45      38      40  
45       Olga Danilovic            23.9     51      50      82  
46       Ashlyn Krueger            20.7     64      54      67  
47       Camila Osorio             23.0     59      49      56  
48       Dayana Yastremska         24.6     33     104      96  
49       Clara Tauson              22.0     50      83      64  
50       Karolina Pliskova         32.8     41      40      39

No big surprises here–that’s the nature of a model like this. Where players are predicted to move up or down, it’s usually because their Elo rank is notably higher or lower than their official position, like Muchova or Paolini. Mirra Andreeva, the youngest woman in the top 175, is expected to gradually work her way into the top ten.

Getting fuzzier

Of course, there’s considerable uncertainty. When we check in at the end of the 2025 season, we’ll find some substantial moves, like Paolini in 2024. We can get a better idea of that uncertainty by forecasting the likelihood that players reach certain thresholds.

Here is each top player’s probability of becoming the 2025 year-end number one:

Player              p(#1)  
Aryna Sabalenka     42.3%  
Iga Swiatek         32.6%  
Coco Gauff          21.1%  
Qinwen Zheng         6.9%  
Elena Rybakina       4.3%  
Jasmine Paolini      2.8%  
Jessica Pegula       2.4%  
Emma Navarro         0.9%  
Paula Badosa         0.9%  
Daria Kasatkina      0.9%  
Barbora Krejcikova   0.7%  
Mirra Andreeva       0.7%  
Diana Shnaider       0.5%  
Karolina Muchova     0.5%

This is not the list I would have made. Again, this type of model isn’t going to give you big surprises, and there’s no consideration for things like playing styles. Intuitively, a big breakthrough from Andreeva (or Shnaider) seems more likely than a belated push from Kasatkina, or even Pegula.

In any event, we get an idea of how much the ranking list can shuffle itself in a year’s time. Even beyond these 14 names, the model gives another 20 women at least a one-in-a-thousand chance to end the year at the top.

We can run a similar exercise to get the odds that each player ends the season in the top 5, 10, or 20:

Player                    p(top 5)  p(top 10)  p(top 20)  
Aryna Sabalenka              82.4%      95.8%      99.3%  
Iga Swiatek                  81.0%      94.9%      98.9%  
Coco Gauff                   75.5%      92.7%      98.3%  
Qinwen Zheng                 50.3%      80.3%      95.5%  
Elena Rybakina               32.5%      65.5%      90.3%  
Jessica Pegula               15.5%      42.0%      78.4%  
Paula Badosa                 15.2%      41.5%      81.7%  
Mirra Andreeva               13.7%      34.5%      68.3%  
Jasmine Paolini              13.1%      38.4%      77.7%  
Karolina Muchova             10.6%      30.2%      69.8%  
Diana Shnaider                8.8%      25.7%      64.6%  
Emma Navarro                  7.9%      24.0%      60.2%  
Marketa Vondrousova           6.6%      19.2%      53.8%  
Daria Kasatkina               5.8%      18.3%      49.6%  
Marta Kostyuk                 4.9%      14.9%      43.4%  
Madison Keys                  4.9%      15.8%      49.7%  
Barbora Krejcikova            4.2%      13.5%      40.2%  
Beatriz Haddad Maia           3.8%      12.1%      39.7%  
Anna Kalinskaya               3.5%      11.2%      35.8%  
Jelena Ostapenko              3.0%       9.4%      28.8%  
Leylah Fernandez              2.9%       8.5%      25.8%  
Liudmila Samsonova            2.8%       8.6%      27.0%  
Linda Noskova                 2.8%       8.2%      24.9%  
Ons Jabeur                    2.8%       8.7%      31.7%  
Maria Sakkari                 1.9%       6.1%      23.1%  
                                                          
Player                    p(top 5)  p(top 10)  p(top 20)  
Danielle Collins              1.9%       6.3%      22.5%  
Elina Svitolina               1.7%       5.7%      21.6%  
Donna Vekic                   1.7%       5.4%      21.1%  
Victoria Azarenka             1.6%       5.9%      28.2%  
Anastasia Potapova            1.5%       4.5%      15.8%  
Emma Raducanu                 1.5%       4.7%      21.6%  
Amanda Anisimova              1.1%       3.5%      15.4%  
Yulia Putintseva              1.0%       3.4%      15.1%  
Katie Boulter                 1.0%       3.3%      13.5%  
Marie Bouzkova                0.8%       2.4%       8.8%  
Elise Mertens                 0.8%       2.5%      10.1%  
Xin Yu Wang                   0.8%       2.3%       7.8%  
Ashlyn Krueger                0.8%       2.1%       7.3%  
Camila Osorio                 0.7%       2.0%       7.4%  
Ekaterina Alexandrova         0.7%       2.1%       7.9%  
Magdalena Frech               0.6%       2.0%       8.0%  
Katerina Siniakova            0.6%       2.0%       8.1%  
Olga Danilovic                0.6%       1.8%       6.8%  
Peyton Stearns                0.6%       1.7%       6.6%  
Anastasia Pavlyuchenkova      0.6%       1.9%       8.9%  
Elina Avanesyan               0.6%       1.7%       6.2%  
Clara Tauson                  0.5%       1.4%       4.3%  
Lulu Sun                      0.5%       1.5%       5.9%  
Eva Lys                       0.4%       1.2%       4.8%  
Elisabetta Cocciaretto        0.4%       1.2%       4.1% 

Most interesting to me in this table is where the columns diverge. Andreeva, with her unrealized potential, ranks higher on the top-5 list than by top-10 or top-20 probability. Azarenka, though she has little chance of returning to the top ten, is more likely than her list-neighbors to hang inside the top 20.

The same variation means that there are some new names in the table. Eva Lys, for instance, is forecast to land at #65 ahead of the 2026 season. But because she is young and has already posted multiple top-100 seasons by Elo rating, she has an outsized chance of a major breakout. The women who were displaced are either fringy veterans, like Pliskova, or those whose Elo ratings didn’t match their WTA rank, such as Yastremska.

(These forecasts are probably more accurate than the year-end-number-one table above. There haven’t been many year-end number ones, by definition, so there’s less data to draw upon.)

Long may Sinner reign

Now for the men. I’ve extended this list to 51 for obvious reasons:

YE 25  Player                  Age  YE 24  Elo 24  Elo 23  
1      Jannik Sinner          23.4      1       1       2  
2      Carlos Alcaraz         21.7      3       3       3  
3      Alexander Zverev       27.7      2       4       5  
4      Taylor Fritz           27.2      4       6      10  
5      Daniil Medvedev        28.9      5       5       4  
6      Novak Djokovic         37.6      7       2       1  
7      Holger Rune            21.7     13      10      12  
8      Jack Draper            23.0     15       8      19  
9      Casper Ruud            26.0      6      21      16  
10     Alex de Minaur         25.9      9      16      11  
11     Andrey Rublev          27.2      8      18       6  
12     Stefanos Tsitsipas     26.4     11      14       9  
13     Tommy Paul             27.6     12      11      18  
14     Hubert Hurkacz         27.9     16       9       8  
15     Grigor Dimitrov        33.6     10       7       7  
16     Ugo Humbert            26.5     14      17      13  
17     Lorenzo Musetti        22.8     17      20      50  
18     Arthur Fils            20.6     20      25      38  
19     Ben Shelton            22.2     21      22      17  
20     Sebastian Korda        24.5     22      15      22  
21     Tomas Machac           24.2     25      12      33  
22     Karen Khachanov        28.6     19      19      23  
23     Felix Auger Aliassime  24.4     29      28      15  
24     Frances Tiafoe         26.9     18      33      26  
25     Matteo Berrettini      28.7     34      13      14  
                                                           
YE 25  Player                  Age  YE 24  Elo 24  Elo 23  
26     Alexei Popyrin         25.4     24      27      75  
27     Jiri Lehecka           23.1     28      39      46  
28     Flavio Cobolli         22.7     32      30     136  
29     Alex Michelsen         20.4     41      35     134  
30     Jakub Mensik           19.3     48      37     119  
31     Mpetshi Perricard      21.5     31      43     192  
32     Francisco Cerundolo    26.4     30      36      25  
33     Matteo Arnaldi         23.9     37      48      31  
34     Sebastian Baez         24.0     27      67      40  
35     Brandon Nakashima      23.4     38      42      70  
36     Jordan Thompson        30.7     26      29      51  
37     Juncheng Shang         19.9     50      52       
38     Tallon Griekspoor      28.5     40      32      24  
39     Alejandro Tabilo       27.6     23      54     121  
40     Denis Shapovalov       25.7     56      34      34  
41     T M Etcheverry         25.5     39      58      65  
42     Alexander Bublik       27.5     33      59      44  
43     Davidovich Fokina      25.6     61      46      28  
44     Roman Safiullin        27.4     60      38      27  
45     Nicolas Jarry          29.2     35      63      20  
46     Nuno Borges            27.9     36      53      88  
47     Thanasi Kokkinakis     28.7     77      24      61  
48     Luciano Darderi        22.9     44     106     122  
49     Miomir Kecmanovic      25.3     54      65      71  
50     Jan Lennard Struff     34.7     42      26      35  
51     Joao Fonseca           18.4    145      45     

The men’s ranking model is more accurate than the women’s version, though that may be because it is built, in part, on the unusually stable Big Three/Big Four era. That stability might be gone, taking the reliability of this model with it. (The men’s model predicted the log of next year’s ranking with an adjusted r-squared of .631, compared to .580 for the women.) So again, if it looks boring, that’s the nature of the beast.

Still: We have Carlos Alcaraz taking back the number two spot, Holger Rune returning to the top ten, and Jack Draper following him in. In the other direction, we see Grigor Dimitrov’s age catching up to him, dropping five spots from his current position.

At the bottom of the list, we find Joao Fonseca bounding up nearly 100 ranking spots in a single season. That already feels conservative, less than one week into his season. All of these numbers are based on 2024 year-end rankings, yet Fonseca is up 18 places in the live rankings with his run to the Canberra Challenger final. He’d gain another 14 with a win tomorrow.

What about Novak?

The table above shows Novak Djokovic in 6th place, a prediction that aggregates a vast range of possibilities. Here are the odds of various players ending 2025 at the top of the list:

Player             p(#1)  
Jannik Sinner     56.4%  
Carlos Alcaraz    22.5%  
Novak Djokovic    14.6%  
Alexander Zverev   3.8%  
Daniil Medvedev    3.4%  
Taylor Fritz       1.3%  
Holger Rune        1.2%  
Jack Draper        1.2%  
Hubert Hurkacz     1.0%  
Grigor Dimitrov    0.7% 

No one else is even half as likely as Dimitrov to end the season ranked #1. Sinner is the clear favorite, with virtually every stat in his favor. Alcaraz is expected to improve. Djokovic, though, is the clear number three, far ahead of the other players above him in the previous table.

This is partly to be expected: He ended 2024 in second place on the Elo list. He didn’t play a full schedule, but he posted great results much of the time he played, and Alcaraz wasn’t consistent enough to capitalize on the veteran’s step back. Beyond that, remember that the model considers last year’s Elo rank as well. Twelve months ago, Djokovic still had a strong claim to be the best player in the world. His age counts against him, but he is one of only a few players in the 2025 field who has proven he can reach the top.

Novak’s 6th-place forecast, then, averages a disproportionately high probability of a resurgence with all the things that can happen to 37-year-old athletes. He’s more likely than, say, (projected) #5 Medvedev or #7 Rune to claim the top spot, but he’s also more likely to fall down the list due to injury or lack of interest.

Djokovic looks like less of an outlier when we see the chances of top-5, top-10, and top-20 finishes this year:

Player                  p(5)  p(10)  p(20)  
Jannik Sinner          95.6%  98.9%  99.8%  
Carlos Alcaraz         84.5%  95.7%  99.2%  
Alexander Zverev       61.7%  88.4%  97.5%  
Daniil Medvedev        38.5%  71.8%  92.6%  
Taylor Fritz           34.1%  72.0%  92.9%  
Novak Djokovic         32.4%  59.8%  86.4%  
Holger Rune            20.3%  52.8%  86.1%  
Jack Draper            15.6%  46.3%  82.2%  
Hubert Hurkacz          9.8%  29.9%  68.2%  
Andrey Rublev           9.8%  31.8%  70.8%  
Stefanos Tsitsipas      9.6%  31.6%  70.6%  
Alex de Minaur          9.5%  32.9%  72.1%  
Grigor Dimitrov         8.3%  27.0%  63.1%  
Casper Ruud             7.7%  31.1%  70.6%  
Tommy Paul              7.1%  26.8%  65.0%  
Ugo Humbert             5.3%  20.2%  56.9%  
Ben Shelton             4.8%  18.5%  55.8%  
Sebastian Korda         4.5%  17.8%  53.5%  
Tomas Machac            4.4%  18.3%  54.3%  
Arthur Fils             3.7%  17.0%  54.0%  
Lorenzo Musetti         3.4%  16.6%  52.3%  
Matteo Berrettini       2.4%   8.7%  32.2%  
Felix Auger Aliassime   2.1%   8.2%  32.7%  
Karen Khachanov         2.0%   8.8%  32.8%  
Frances Tiafoe          1.3%   6.3%  25.9%  
                                            
player                  p(5)  p(10)  p(20)  
Jiri Lehecka            1.0%   5.0%  22.7%  
Alexei Popyrin          0.9%   5.4%  23.1%  
Francisco Cerundolo     0.8%   3.8%  17.3%  
Flavio Cobolli          0.7%   4.5%  20.7%  
Jakub Mensik            0.7%   4.1%  20.0%  
Alex Michelsen          0.7%   4.2%  20.2%  
Matteo Arnaldi          0.7%   3.0%  14.7%  
Tallon Griekspoor       0.6%   2.4%  11.0%  
Brandon Nakashima       0.5%   2.8%  13.9%  
Denis Shapovalov        0.5%   2.2%  10.5%  
Sebastian Baez          0.5%   2.5%  12.6%  
Mpetshi Perricard       0.5%   3.3%  16.2%  
Jordan Thompson         0.4%   2.3%  10.3%  
Davidovich Fokina       0.4%   1.5%   7.6%  
Roman Safiullin         0.4%   1.5%   7.1%  
Juncheng Shang          0.3%   2.0%  10.8%  
Nicolas Jarry           0.3%   1.2%   5.7%  
Thanasi Kokkinakis      0.3%   1.2%   5.7%  
Alexander Bublik        0.3%   1.3%   6.6%  
T M Etcheverry          0.3%   1.4%   7.1%  
Alejandro Tabilo        0.2%   1.5%   7.5%  
Jan Lennard Struff      0.2%   1.0%   4.3%  
Joao Fonseca            0.2%   1.0%   5.7%  
Nuno Borges             0.2%   1.0%   5.2%  
Miomir Kecmanovic       0.2%   0.9%   4.5%

The various models don’t quite agree: It can’t really be the case that if Djokovic cracks the top five (32.4% here), it’s nearly 50/50 whether he ends the season at number one. From outside of the models, we can be particularly skeptical, since we know that Novak isn’t likely to play a full schedule. Still, we can glean something from the juxtaposition: There’s not a lot of middle ground for the all-time-great.

Again, it’s worth peeking at the bottom of the list. Fonseca makes this one, too, with a nearly 6% chance of a top-20 debut this year. (Actually, a debut is even more likely, since this is the stricter probability of a year-end top-20 finish.) It seems a bit crazy to say that the 18-year-old has the same top-20 chances as Nicolas Jarry. On the other hand, Fonseca leads Jarry on the Elo table by a healthy margin. He may already be the stronger player.

Few pros are likely catapult up or down the rankings like Fonseca. Plenty will make moves that these models don’t foresee. With the information available at the beginning of the season, we can get a general sense of how things will change over the next twelve months. Now for the good part: We get to find out how the models were wrong.

* * *

Subscribe to the blog to receive each new post by email:

 

The Pending Breakthroughs of 2025

Eva Lys, probably a top-100 player in 2025. Credit: Nuta Lucian

Every year, Challenger maven Damian Kust lists the players he thinks are likely to join the ATP top 100 in the coming year. He did a typically good job last year, picking 14 of the 20 players who reached the threshold in 2024. We can forgive him for missing Jacob Fearnley, who rose from 646th to the top 90 in less than twelve months.

I’ve yet to meet a forecast that I didn’t want to mathematically model, and this is no exception. An algorithm probably isn’t going to do better than Damian does, as it will miss all kinds of details accumulated by a full-time tour watcher. But the exercise will give us a better idea of what factors make it more or less likely that a player joins the top-100 club.

Let’s get straight to the forecast:

Rank  Kust  Player               Rank  Elo Rk   Age  p(100)  
1     3     Joao Fonseca          145      45  18.4   96.5%  
2     4     Learner Tien          122      74  19.1   92.4%  
3     1     Hamad Medjedovic      114      91  21.5   89.1%  
4     5     Nishesh Basavareddy   138      84  19.7   84.2%  
5     9     Raphael Collignon     121      97  23.0   82.5%  
6     8     Martin Landaluce      151      99  19.0   82.1%  
7     6     Jerome Kym            134     111  21.9   79.6%  
8           Leandro Riedi         135     108  22.9   71.9%  
9     15    Jaime Faria           123     146  21.4   69.0%  
10    7     Jesper de Jong        112     117  24.6   66.8%  
11    12    Tristan Boyer         133     116  23.7   64.0%  
12    2     Francesco Passaro     108     147  24.0   60.9%  
13          Harold Mayot          116     154  22.9   57.6%  
14    10    Alexander Blockx      203     102  19.7   56.8%  
15    16    Valentin Vacherot     140     110  26.1   55.2%  
16    11    N Moreno de Alboran   110     132  27.5   52.5%  
17          Lukas Klein           136     126  26.8   47.0%  
18    19    Elmer Moeller         160     160  21.5   37.4%  
19    18    Duje Ajdukovic        142     171  23.9   36.6%  
20          Terence Atmane        158     174  23.0   35.5%  
21          R A Burruchaga        156     177  22.9   28.1%  
22          Matteo Gigante        141     203  23.0   26.8%  
23    13    Vit Kopriva           130     150  27.5   26.3%  
24          Gustavo Heide         172     190  22.8   24.3%  
25          Coleman Wong          170     238  20.6   24.3%  
            …                                                
35    14    Mark Lajal            229     187  21.6   13.4%  
            …                                                
41    17    Dino Prizmic          292     167  19.4   10.6%  
42    20    James Trotter         193     175  25.4   10.4%

The table shows the 25 men who are most likely to make their top-100 debut this year, plus a few more from Damian’s list. I’ve included Damian’s rankings*, as well as each player’s year-end ATP ranking, year-end ranking on my Elo list, and their current age. The final column, “p(100),” is their probability of reaching the ranking milestone sometime in 2025.

* Damian points out that his numbering wasn’t intended as an explicit ranking, though he did end up picking the more obvious players first, with the long shots at the end.

The three columns between the players and their probabilities are the main components of the logistic-regression model. Age, unsurprisingly, is key. The younger the player, the more likely he’ll improve. Plus, the youngest men may have played limited schedules, causing their official rankings to underestimate their ability levels.

It’s a bit unusual to include both ATP rank and Elo rank, since they are simply different interpretations of the same underlying match results. In this case, though, it makes sense. Elo is better at predicting a player’s performance tomorrow, and it outperforms the official list as a way of predicting rankings a year from now. However, we’re trying to forecast ranking breakthroughs less than a year from now. If Fonseca has a good month Down Under, he’ll crack the top 100 in large part thanks to his eleven months’ worth of ranking points from 2024. In this model, then, the ATP ranking tells us how close a player is to the point total he needs. Elo tells us more about how likely he is pile up the remaining wins.

A player’s existing stock of points turns out to be somewhat more important than his underlying skill level. The model weights ATP ranking about half-again as heavily as Elo rank.

There are innumerable other variables we could include. I tested a lot of them. The only other input I kept was height. Height is only a minor influence on top-100 breakthroughs, but it’s definitely better to be taller. De Jong, for instance, is five feet, eleven inches tall. He ranks eighth on the 2025 list when height is omitted, and falls to tenth when height is included.

This tallies with the Challenger-to-tour conversion stats I worked out for my recent post about Learner Tien. Both short players and left-handers have a harder time making the jump than their taller, right-handed peers. Those conversions don’t address quite the same thing, since it’s possible to crack the top 100 with little to no success at tour level–it just means winning lots of Challengers. For that reason, left-handedness is probably an advantage for players aiming to jump from, say, 122nd to the top 100, as Tien is now. The relationship between left-handedness and breakthrough likelihood was less clear-cut than height, though, so I left it out.

J-wow

Enough mechanics–back to the forecasts. Fonseca’s 96.5% probability might strike you as crazily high or outrageously conservative. It’s certainly confident, but then again the Brazilian is a special player. Barring injury–and immediate injury, at that–a breakthrough seems likely to happen soon.

Whether high or low, the Fonseca forecast is unusual. Like his forehand, it puts him in classy company. Going back to 2000, here are the players about whom the model would have been most optimistic:

Year  Player                 Rank  Elo    Age  p(100)  Y+1  
2021  Holger Rune             103    50  18.7   98.7%   10  
2020  Sebastian Korda         118    48  20.5   97.9%   38  
2024  Joao Fonseca            145    45  18.4   96.5%       
2010  Grigor Dimitrov         106    75  19.6   96.3%   52  
2020  Carlos Alcaraz          141    51  17.7   96.1%   32  
2018  Felix Auger Aliassime   108    89  18.4   95.8%   17  
2023  Hamad Medjedovic        113    66  20.5   95.4%  105  
2000  Andy Roddick            156    52  18.3   94.5%   14  
2020  Lorenzo Musetti         128    68  18.8   94.0%   57  
2019  Emil Ruusuvuori         123    64  20.7   94.0%   84

It’s not so remarkable that eight of the nine other players on the list succeeded in reaching the top 100. The forecast would have expected (at least) that. But even including Medjedovic’s disappointing finish to 2024, the average ranking of these nine guys at the end of the following season (“Y+1”) is 45. Three broke into the top 20. And Fonseca’s forecast places him ahead of most of them.

Medjedovic’s near-miss was due in part to illness. It’s worth remembering that this model only predicts a single year; the young Serbian is still set up for a nice career. (Including, probably, a top-100 debut in 2025.) The model would have given Francisco Cerundolo a 90% chance of breaking through in 2021. He didn’t make it, yet he reached the top 20 a couple of years later. Fernando Gonzalez failed to convert an 80% chance in 2001, but after a few more years, he made the top ten.

Using a simple model–instead of the expert opinion of someone like Damian–exposes us to another type of error. The model is optimistic about the 2025 chances of 22-year-old Leandro Riedi, who possesses both official and Elo ranks on the cusp of the top 100. On paper, he’s a great pick. But he had knee surgery in September. Instead of defending points from two Challenger titles in January, he’s continuing to recover. He may ultimately surpass many of the other guys on the list, but even just regaining his pre-injury form this year is a big ask.

Waiting for Eva

Let’s run the same exercise for the women’s game. Unfortunately I don’t have enough height data, so we can’t use that. The resulting model is less predictive than the men’s forecast (even apart from the lack of player heights), but with year-end WTA rank, Elo rank, and age, it’s almost as good.

Patrick Ding took up the task of a Kust-style list for women. It’s unordered, so I’ve added a “Y” in the “PD” column next to his picks:

Rank  PD  Player                Rank  Elo   Age  p(100)  
1     Y   Eva Lys                131   43  23.0   80.1%  
2     Y   Anca Todoni            118  100  20.2   74.9%  
3     Y   Maya Joint             116  173  18.7   65.8%  
4         Aoi Ito                126  109  20.6   65.4%  
5     Y   Marina Stakusic        125  131  20.1   62.3%  
6     Y   Polina Kudermetova     107  159  21.6   61.8%  
7     Y   Alina Korneeva         177   80  17.5   61.8%  
8     Y   Robin Montgomery       117  155  20.3   61.1%  
9     Y   Sara Bejlek            161   88  18.9   59.9%  
10        M Sawangkaew           130   94  22.5   58.8%  
11        Anastasia Zakharova    112  145  23.0   54.1%  
12    Y   Sijia Wei              134  135  21.1   49.9%  
13    Y   Celine Naef            153  124  19.5   48.8%  
14    Y   Antonia Ruzic          143  105  21.9   48.7%  
15        Maja Chwalinska        128  119  23.2   47.7%  
16    Y   Sara Saito             150  182  18.2   43.1%  
17        Alexandra Eala         148  162  19.6   41.6%  
18    Y   Darja Semenistaja      119  192  22.3   41.5%  
19    Y   Dominika Salkova       151  150  20.5   38.1%  
20        Talia Gibson           140  185  20.5   37.2%  
21        V Jimenez Kasintseva   156  170  19.4   36.3%  
22    Y   Ella Seidel            141  205  19.9   36.2%  
23    Y   Iva Jovic              189  157  17.1   33.8%  
24        Daria Snigur           139  161  22.8   32.0%  
25        Francesca Jones        152  106  24.3   31.5%  
26    Y   Solana Sierra          163  156  20.5   30.2%  
27    Y   Ena Shibahara          137  103  26.9   29.1%  
28        Lois Boisson           204   95  21.6   23.9%  
29        Elsa Jacquemot         159  191  21.7   21.8%  
30    Y   Taylah Preston         170  246  19.2   20.0%  
31    Y   Tereza Valentova       240  127  17.9   19.6%  
32        Elena Pridankina       186  201  19.3   18.9%  
33        Lola Radivojevic       185  186  20.0   18.9%  
34    Y   Oksana Selekhmeteva    176  176  22.0   16.8%  
35        Barbora Palicova       180  202  20.8   16.2%

This isn’t quite a fair fight with Patrick, because he made his picks in early October. Two of his choices (Suzan Lamens and Zeynep Sonmez) have already cleared the top-100 hurdle. He would presumably consider Ito more carefully now, since she reached a tour-level semi-final two weeks after he made his list. I should also note that Patrick picked two prodigies outside the top 300: Renata Jamrichova and Mia Ristic. My model didn’t consider players ranked that low. I had to draw the line somewhere, and Fearnley aside, single-year ranking leaps of that magnitude are quite rare.

The mechanics of the algorithm are pretty much the same as the men’s version. The women’s list looks a bit more chaotic, pitting players with strong Elo positions (such as Lys and Korneeva) against others who are close to 100 without the results that Elo would like to see (Joint, Kudermetova, etc).

Eva Lys is fascinating because this is her third straight year near the top of the list. She finished 2022 ranked 127th, standing 71st on the Elo table. Just short of her 21st birthday, that was good for a 76% chance of reaching the top 100 the following year–second on the list to Diana Shnaider. She rose as high as 112, but no further.

A year older, Lys was fourth on the 2023 list. Her WTA ranking of 136 and her nearly-unchanged Elo position of 72 worked out to a 67% chance of a 2024 breakthrough. Only three players–Brenda Fruhvirtova, Erika Andreeva, and Sara Bejlek–scored higher. She came within one victory of the milestone in September but finds herself back on the list for 2024.

Even beyond Lys’s 80% chance of finally making it in 2025, history is encouraging. I went back 25 years for this study, and only two other players would have been given a 50% or better chance of reaching the top 100 for three consecutive years. Stephanie Dubois was on the cusp from 2005 to 2007, finishing the third year ranked 106th. She finally made it in 2008. More recently, Wang Xiyu was within range from 2019-21. (Covid-19 cancellations and travel challenges didn’t help.) She not only cleared the hurdle in 2022, she did it with style, climbing to #50 by the end of that season.

The same precedents bode well for Bejlek, who had a 52% chance of breaking through in 2023, a 77% chance last year, and a 60% probability for 2025.

Mark your calendars

In twelve months, we can check back and see how the model fared against Damian and Patrick. The algorithm has the benefit of precision, and it is less likely to get overexcited about as-yet-unfulfilled potential. The flip side is that it doesn’t consider the innumerable quirks that might bear on the chances of a particular player.

For now, I’m betting on the humans.

* * *

Subscribe to the blog to receive each new post by email:

 

Anna Kalinskaya At Her Peak

Also today: Upsets, (partly) explained; January 23, 1924

Anna Kalinskaya in the 2020 Fed Cup qualifying round. Credit: Nuță Lucian

Should we have seen this coming? Of all the surprises in the top half of the 2024 Australian Open women’s draw, Anna Kalinskaya’s run to the quarter-finals stands as one of the biggest. The 25-year-old was ranked 75th entering the tournament, and she had never reached the third round of a major in 13 previous main-draw attempts.

Had we looked closely before the tournament, we wouldn’t have found a title contender, exactly, but we would have identified Kalinskaya as about as dangerous as a 75th-ranked player could possibly be. She finished 2023 on a 9-1 run, reaching the final at the WTA 125 in Tampico, then winning the title at the Midland 125, where she knocked out the up-and-coming Alycia Parks in the semi-finals. 2024 started well, too: The Russian upset top-tenner Barbora Krejcikova in Adelaide, then almost knocked out Daria Kasatkina in a two hour, 51-minute match two days later.

The only reason her official ranking is so low is that she missed nearly four months last summer to a leg injury that she picked up in the third round in Rome. Her two match wins at the Foro Italico pushed her up to 53rd in the world, just short of her career-best 51st, set in 2022. The Elo algorithm, which measures the quality of her wins rather than the number of tournaments she was healthy enough to play, reflects both her pre-injury successes and the more recent hot streak. Kalinskaya came to Melbourne as the 31st-ranked woman on the Elo list.

These alternative rankings put a different spin on her path through the Australian Open draw so far. Here are the results from her first four rounds, in which she appeared to be the underdog three times:

Don’t be fooled!

Elo has some adjustments to make:

Round  Opponent  Elo Rk  Elo vRk  
R16    Paolini       31       37  
R32    Stephens      31       50  
R64    Rus           31      107  
R128   Volynets      31      139

Kalinskaya was hardly an early favorite–Stephens did her the favor of taking out Kasatkina, and Anna Blinkova (who lost to Paolini) eliminated the third-seeded Elena Rybakina. But given how the draw worked out, seeing the Russian’s name in the quarter-finals wasn’t so unlikely after all.

More luck

Kalinskaya has a dangerous forehand and a solid backhand, but she isn’t an aggressive player by the standards of today’s circuit. Her 14 matches logged by the Match Charting Project average 4.2 strokes per point, and that skews low because it includes three meetings with Aryna Sabalenka. Yesterday’s fourth-round match against Paolini took 5.3 strokes per point, and the third-rounder with Stephens was similar.

By Aggression Score, the 25-year-old rates modestly below average, at -17 in rallies and -15 on returns. While she doesn’t have any weaknesses that prevent her from ending points earlier, she’s more comfortable letting the rally develop. When Paolini played along, the results were remarkable: 32 points reached seven shots or more yesterday, and Kalinskaya didn’t end any of them with an unforced error.

The downside of such a game style is that a lot of opponents won’t be so cooperative. Last fall, the Russian lost back-to-back-to-back matches against Ekaterina Alexandrova, Viktoria Hruncakova, and Ashlyn Krueger, three women who opt for big swings and short points. By contrast, consider the Rally Aggression Scores of the quartet Kalinskaya has faced in Melbourne:

Round  Opponent  AggScore  
R16    Paolini         -5  
R32    Stephens       -16  
R64    Rus            -59  
R128   Volynets       -38

Paolini and Stephens have roughly similar profiles to Kalinskaya’s own; Rus and Volynets are even more conservative.

This isn’t just a convenient narrative: Kalinskaya really is better against more passive players. She has played 118 career tour-level matches against women with at least 20 matches in the charting database. Sort them by Rally Aggression Score and separate them into four equal bins, and the Russian’s preferences become clear:

AggScore Range  Match Win%  
57 to 175            35.7%  
0 to 56              46.4%  
-27 to -1            50.0%  
-137 to -27          59.4%

If the whole tour were as patient as she is, the Russian would already be a household name.

Alas, it’s rare to draw four straight players as conservative as the bunch Kalinskaya has faced in Melbourne. And having reached the quarter-finals, her luck has run out. Her next opponent is Qinwen Zheng, who has a career Aggression Score of 27 and upped that number in 2023. It could be worse–fellow quarter-finalists Sabalenka and Dayana Yastremska are triple-digit aggressors–but it is a different sort of challenge than she has faced at the tournament so far.

To win tomorrow, Kalinskaya will need to play as well as she has for the last few months, only a couple of shots earlier in the rally. Otherwise, Zheng will end points on her own terms, and thousands of potential new fans will be convinced that Kalinskaya really is just the 75th best player in the world.

* * *

Why are upsets on the rise?

Only four seeds, and two of the top eight, survived to the Australian Open women’s quarter-finals. Many of the top seeds lost early. This feels like a trend, and it isn’t new.

One plausible explanation is that the field keeps getting stronger. Top-level players now develop all over the world, and coaching and training techniques continue to improve. There are few easy, guaranteed matches, even if Iga Swiatek and Aryna Sabalenka usually(!) make it look that way. I believe this is part of the story.

Another component, I suspect, is the shift in playing styles. I noted a couple of weeks ago when writing about Angelique Kerber is that WTA rally lengths have steadily declined in the last decade. In 2013, the typical point lasted 4.7 strokes; it’s now around 4.3. Shorter points are caused by more risk-taking. Risks don’t always work out, full-power shots go astray, and the better-on-paper player doesn’t always win.

In 2019, I tested a similar theory about men’s results. I split players in four quartiles based on Aggression Score and tallied the upset rate for every pair of player types. When two very aggressive players met, nearly 39% of matches resulted in upsets, compared to 25% when two very passive players met. The true gap isn’t quite that big: given the specific players involved, there should have been a few more upsets among the very aggressive group. But even after adjusting for that, it remained a substantial gap.

It stands to reason that the story would be the same for women. Instead of Aggression Score, I used average rally length. I doubt there’s much difference. I didn’t intend to change gears, I just got halfway through the project before checking what I did the first time.

The most aggressive quartile (1, in the table below) are players who average 3.6 shots per rally or less. The next group (2) ranges from 3.7 to 4.0, then (3) from 4.1 to 4.5, and finally (4) 4.6 strokes and up. The following table shows the frequency of upsets (Upset%) and how the upset rate compares to expectations (U/Exp) for each pair of groups:

Q1  Q2  Upset%  U/Exp  
1   1    40.7%   1.07  
2   1    36.2%   0.99  
2   2    35.7%   0.99  
3   1    35.1%   0.93  
3   2    35.5%   0.97  
3   3    40.9%   1.07  
4   1    37.6%   1.03  
4   2    36.6%   1.02  
4   3    34.6%   0.95  
4   4    34.7%   0.97

(If you look back to the 2019 study, you’ll notice that I did almost everything “backwards” this time — swapping 1 for 4 as the label for the most aggressive group, and calculating results as favorite winning percentages instead of upsets. Sorry about that.)

Matches between very aggressive players do, in fact, result in more upsets than expected. It’s not an overwhelming result, partly because it’s only 7% more than expected, and partly because matches between third-quartile players–those with average rally lengths between 4.1 and 4.5–are just as unexpectedly unpredictable.

I don’t know what to make of the latter finding. I can’t think of any reasonable cause for that other than chance, which casts some doubt on the top-line result as well.

If the upset rate for matches between very aggressive players is a persistent effect, it would give us more upsets on tour today than we saw a decade ago. An increasing number of players fit the hyper-aggressive mold, so there are more matchups between them. The logic seems sound to me, though it may be the case that other sources of player inconsistency outweigh a woman’s particular risk profile.

* * *

January 23, 1924: Debuts and dropshots

Men’s tennis ruled at the early Australian Championships. The tournament had been held since 1905 (as the “Australasian” Championships), but there was no women’s singles until 1922. On January 23rd, midway through the 1924 edition, the press corps was preoccupied with the severity of Gerald Patterson’s sprained ankle and the question of whether Ian McInnes had been practicing.

James O. Anderson, the 1922 singles champion who would win the 1924 edition as well, introduced what was then–at least to the Melbourne Argus–an on-court novelty:

He has developed a new stroke since he last played in Melbourne, and it has proved successful. On the back of the court he makes a pretence of sending in a hard drive, but with a delicate flick of the wrist he drops the ball just over the net, leaving his opponent helpless 30 feet away.

A veritable proto-Alcaraz, was James O.

For the few fans who weren’t solely focused on Australia’s Davis Cuppers, a superstar was emerging before their eyes. Also on the 23rd, 20-year-old Daphne Akhurst made quick work of Violet Mather, advancing to the semi-finals in her first appearance at the Championships.

Akhurst wouldn’t go any further, unable to withstand the heavy forehand of Esna Boyd in the next round. But it was nonetheless a remarkable debut: She won both the women’s and the mixed doubles titles. The correspondent for the Melbourne Age, recapping the mixed final, could hardly contain his admiration:

Miss Akhurst–an artist to her finger tips–belied her delicate mid-Victorian appearance that suggested that she had slipped out of one of Jane Austen’s books by sifting out cayenne pepper strokes from a never-failing supply.

Daphne and Jack Willard–“who ran for every ball, and continued running after he played the ball”–defeated Boyd and Gar Hone in straight sets.

The pair of championships was a harbinger of things to come. Between 1925 and 1931, Akhurst would win five singles titles (losing only in 1927 when she withdrew), four more in the women’s doubles, and another three mixed. The only thing that could stop her were the customs of the day: She married in 1930 and retired a year later. Tragically, she died from pregnancy complications in 1933, at the age of 29.

Daphne is best known these days as the name on the Australian Open women’s singles trophy. For the next several years, there will be many more Akhurst centennials to celebrate.

* * *

Subscribe to the blog to receive each new post by email:

 

Predicting Next Year’s Elo Ratings

I often illustrate the difference between Elo ratings and the traditional ATP and WTA ranking-point systems as follows: The official rankings tell you how good a player was six months ago. Elo estimates where they are today. For the purposes of tournament entry and so on, a 52-week average makes sense. But if you’re predicting the outcome of tomorrow’s match, you don’t want to assign the same weight to a year-old result that you give to yesterday’s news.

That said, Elo ratings are not explicitly predictive. They rely only on past results. They don’t recognize the fact that a player on a hot streak will probably cool off, or that a younger player is more likely to improve than an older one. If we want to look further ahead than tomorrow’s match, we need to take some of those additional factors into account.

Hence today’s project: Projecting Elo ratings one year in advance. Elo ratings tend to be a leading indicator of official rankings, so if we can get some idea of a player’s future in Elo terms, we can estimate–very approximately, I admit–his or her ATP or WTA ranking even further out.

I kept things simple. Each player’s forecast is based on four variables: Age, current Elo rating, rating one year ago, and rating two years ago. Current rating is by far the most important consideration. It accounts for over 70% of the men’s forecast and 80% of the women’s. Everything else is essentially a tweak. The two older ratings allow the forecast to make adjustments if the current rating is an outlier. By including player age, we account for the fact that players over 25 or 26 start–on average!–to decline, and the older they are, the sharper the decline.

Take Novak Djokovic as an example. His current Elo rating is 2,227, one year ago it was 2,145, and two years ago it was 2,186. Because his 2023 year-end rating was higher than 2021 or 2022, we’d expect a small step backwards. And because he’s 36 years old, the laws of physics might eventually slow him down. Put it all together, and the model projects his 2024 year-end Elo at 2,116. Excellent, but slightly more human, and a number that would’ve placed him third on this year’s list.

Here is what the model predicts as the 2024 year-end top ten:

Rank  Player              2024 Elo  2023 Rank  2023 Elo  
1     Jannik Sinner           2144          2      2197  
2     Carlos Alcaraz          2137          3      2149  
3     Novak Djokovic          2116          1      2227  
4     Daniil Medvedev         2059          4      2104  
5     Alexander Zverev        2021          5      2024  
6     Andrey Rublev           1988          6      2020  
7     Stefanos Tsitsipas      1969          9      1974  
8     Holger Rune             1954         12      1936  
9     Hubert Hurkacz          1950          8      1983  
10    Grigor Dimitrov         1928          7      2011

As precise as that table looks, it is hard to predict the future. Here are the same ten players, with a 95% prediction interval shown:

The intervals demonstrate just how uncertain we are, with 12 months of tennis to play. If Jannik Sinner or Carlos Alcaraz hits the high end of his range, in the mid-2,300s, he’ll have established himself as a runaway number one. But if they surprise in the other direction, they’ll land below 2,000 and just barely stay in the top ten. Even these intervals don’t quite account for all the unknowns. There’s a nonzero chance that any of these guys will get hurt and miss most of the season, leaving them off the 2024 year-end list entirely.

I suspect, also, that a more sophisticated model would give a different range of outcomes for Djokovic. There are few precedents for his level of play at age 36, and he outperformed expectations in 2023. Had we run this model a year ago, it would’ve predicted a 2,071 Elo for him now. He beat that by more than 150 points, landing around the 85th percentile of the projection. But time is cruel. Since 1980, five out of six 36-year-olds have seen their Elo decline from the previous season. The average year-over-year change–including those few players who gained–is a loss of 45 points. It’s hard to bet against Djokovic, but at this point in his career, his downside almost certainly exceeds his upside.

Finally, let’s take a look at the projected 2024 top ten on the women’s side. It’s not nearly as juicy as the men’s forecast, as it barely differs from the 2023 list. As I mentioned above, a player’s current rating is a bigger factor in the forecast than it is for men–age is less of a factor, and if a player’s rating jumps around from year to year, women are more likely to stay at their current level than bounce back to a previous one. The forecast:

Rank  Player               2024 Elo  2023 Rank  2023 Elo  
1     Iga Swiatek              2197          1      2237  
2     Cori Gauff               2100          2      2127  
3     Aryna Sabalenka          2062          3      2099  
4     Jessica Pegula           2035          4      2089  
5     Elena Rybakina           2024          5      2059  
6     Marketa Vondrousova      1977          8      2005  
7     Ons Jabeur               1976          7      2007  
8     Karolina Muchova         1965          6      2014  
9     Qinwen Zheng             1961          9      2000  
10    Liudmila Samsonova       1938         11      1959

You might have noticed in both the ATP and WTA lists that most ratings–at least for top-tenners–are projected to go down. There’s a small regression component in the model, meaning that every player is expected to pull a bit back toward the middle of the pack. That doesn’t mean they will, of course, but on average, that’s what happens.

Here are the prediction intervals for the women’s top ten:

The magnitude of the intervals is about the same as it was for the men. Iga Swiatek could launch into a peak-Serena-like stratosphere, or she could, conceivably, land at the fringes of the top ten. Liudmila Samsonova, bringing up the end of this list, might challenge for a place in the top three, or she could be scrambling to stay in the top 50.

One thing is certain: The 2024 year-end lists won’t actually look like this. The value of this sort of forecast, even when it is so approximate, lies in the context it gives us. A year from now, we’ll be talking about which players outperformed or underperformed their expectations. Projections like these help us pin down what, exactly, was a reasonable expectation in the first place.

* * *

I’ll be writing more about analytics and present-day tennis in 2024. Subscribe to the blog to receive each new post by email:

 

Is It Ever Better To Be Unseeded?

As draw-probability takes go, this one is pretty spicy:

Satisfyingly counterintuitive if true. Is it?

A few reasons for skepticism: As an unseeded player, you could get a top-eight seeded opponent in the first round. Or the second. Or, after upsetting a lower seed–you are almost guaranteed to get one in the first or second round–you could still end up with a top-eight seed in the third round. Going into the draw unseeded is hardly protection against a top-eight opponent.

I could theorize further, but why not just delve into the numbers?

The men’s draw

Let’s look at a few examples from the draw. The 25th seed is Nicolas Jarry, who was drawn to face Carlos Alcaraz in the third round (ouch!). His grass-court Elo (gElo)–the number I use to generate forecasts–is 1698.5. The closest unseeded player to him on the gElo list is Adrian Mannarino, who has a rating of 1700.8. In Elo terms, a difference of 2.3 points is basically just a rounding error.

If Ricky’s theory is correct, on the morning of the draw, it was better to be Mannarino than Jarry. Except–oops!–Mannarino was drawn to face third-seed Daniil Medvedev in the second round.

How does all that good and bad luck shake out in the forecast? Jarry has a 7.5% chance of reaching the round of 16, 2.6% for the quarters, and 1.0% for the semis. Mannarino has 6.3% for R16, 3.2% for the quarters, and 1.1% for the semis. Those are awfully close, just like the near-identical gElo ratings would imply. The luck mostly washed out.

(If you look at my forecast after the tournament begins, the numbers will no longer be the same. That’s partly because every result has an effect on many other probabilities, and partly because the gElo ratings will slightly change when I add this week’s results from Eastbourne and Mallorca, which are not yet in the system.)

What about 26th seed Denis Shapovalov? Shapo has a gElo of 1675.1, roughly equal to unseeded Ugo Humbert’s 1676.1. Would it be better to be Ugo?

Shapovalov got lucky: His top-eight counterpart in the draw is Casper Ruud, a not-grass specialist who is barely rated higher than the Canadian. Shapo’s odds of going further than Ruud into the round of 16 are 25.3%. He has a 10.5% chance of making the quarters and a 3.4% shot at the semis.

Humbert was not so lucky. Like Jarry, he’s in Alcaraz’s section. He has a mere 3.5% shot at the fourth round, 1.1% for the quarters, and 0.4% for the finals. The way the cookie crumbled on draw day, it was much better to be Shapo than Ugo.

One more. Dan Evans is the 27th seed, with a gElo of 1693.1. The closest unseeded player in the draw is Sebastian Ofner, gElo-rated 1688.5. Evans lines up for a third-rounder with 8th-seed Jannik Sinner, who is much better than Ruud despite the number next to his name. Despite a tricky first-rounder with Quentin Halys and Sinner looming in the third, Evans’s chances of making the fourth round are 14.5%, along with 6.8% for the quarters and 3.2% for the semis.

By unseeded standards, Ofner got lucky. He drew almost-seeded Jiri Lehecka to open, but the seeds in his section are #18 Francisco Cerundolo and #16 Tommy Paul. With the benefit of that good fortune, his chances of lasting to the second week are 16.0%, with a 4.1% shot at the quarters and a 1.3% chance of a semi-final berth. By the numbers, I’d take Evans’s position over Ofner’s, though it’s pretty close.

So: three anecdotal comparisons, one saying it is definitely better to be the seed, one saying it’s marginally better, one saying it’s about even.

There’s one obvious counter-example. Tomas Martin Etcheverry, seeded 29th, landed in Novak Djokovic’s section. He has a mere 0.8% chance at the fourth round, 0.2% for the quarters, and everything else rounds down to zero. His own rating is part of the problem: He has little experience on grass.

The closest unseeded player in the draw to Etcheverry’s 1585.5 gElo is Daniel Altmaier at 1587.8. Altmaier ended up in the Sinner/Evans section, with an unseeded first-round opponent. His chances of reaching the fourth round are 4.8%, with a 1.5 chance of the quarter-finals.

So we can say one thing for sure: If you know you’ll be drawn to face Djokovic early, you might want to not do that.

The general solution

These are all anecdotes, and the forecasts are entirely dependent on this year’s actual Wimbledon draw. That doesn’t answer the question in any comprehensive way.

We can get closer to a general solution by running two simulations. First, forecast the 2023 Wimbledon field, with the actual seeds, without considering how the draw actually played out. So Etcheverry might have landed in Ruud’s section, or Mannarino might have drawn Djokovic in the first round.

Next, forecast the 2023 Wimbledon field, but instead of keeping the actual seeds, assign the 25th to 32nd seeds to the next eight players in the rankings. Instead of the 25th seed belonging to Jarry, we give it to Lehecka, and Jarry is unseeded, and so on.

By keeping the players constant and varying the seeds, we can see the effect of the seedings on 16 players: the actual seeds 25-32, and the “next eight” who just missed.

Here are the chances of those 16 men reaching the fourth round in the two scenarios, seeded and unseeded:

Player                       R16 Seed  R16 Un  
Nicolas Jarry                   15.3%   13.1%  
Denis Shapovalov                12.8%   11.0%  
Daniel Evans                    15.0%   12.8%  
Tallon Griekspoor               30.5%   28.1%  
Tomas Martin Etcheverry          6.1%    4.9%  
Nick Kyrgios                    20.6%   18.3%  
Alejandro Davidovich Fokina     12.8%   11.0%  
Ben Shelton                      4.4%    3.5%  
Jiri Lehecka                     9.7%    8.0%  
Matteo Berrettini               33.5%   30.9%  
Ugo Humbert                     13.2%   11.4%  
Andy Murray                     31.9%   29.4%  
Lorenzo Sonego                  19.8%   17.5%  
Miomir Kecmanovic                8.1%    6.5%  
Botic van de Zandschulp         14.0%   11.9%  
Adrian Mannarino                15.7%   13.6%

On average, these players have a 16.5% chance of lasting to the second week if they have a seed, 14.5% otherwise.

The same thing holds if we care more about other achievements, like reaching the third round, the quarter-finals, or the semis:

            R32    R16    QF    SF  
Seeded    40.5%  16.5%  8.4%  3.8%  
Unseeded  28.7%  14.5%  6.9%  3.1%

It’s better to be seeded.

Going wide

This isn’t a truly general solution, because it is based solely on the 2023 Wimbledon men’s field. You might think of this group of players as top-heavy, which would make it more valuable to avoid the top seeds. But while Djokovic and Alcaraz are well ahead of the pack, the top eight as a whole is not overwhelming dominant–just think of Ruud on grass.

We could construct a variety of other draws with different mixes of ability levels. You could imagine a field in which the top eight players were all outstanding and the rest were not. An extreme example like that might change the results. We’ll save that for another day. In the meantime, players: Keep chasing those seeds.

* * *

Subscribe to the blog to receive each new post by email:

 

Are Conditions Slower? Faster? Weirder?

Many players didn’t like the conditions at Roland Garros this year. The clay, apparently, was slower and heavily watered, at least on some courts. The balls were heavier than usual, especially when they had been in play for a little while and the clay began to stick to them.

Maybe the courts really did play differently. We could compare ace rate, rally length, or a few other metrics to see whether the French played slower this year.

I’m interested in a broader question. Were the conditions weirder? To put it another way, were they outside the normal range of variation on tour? We could be talking about anything that impacts play, including surface, balls, weather, you name it.

This is surprisingly easy to test. The weirder the conditions, the more unpredictable the results should be. If you don’t get the connection, think about really strange conditions, like playing in mud, or in the dark, or with rackets that have broken strings. In those situations, the factors that determine the winner of a match are so different than usual that they will probably seem random. At the very least, there will be more upsets. Holding a top ranking in “normal” tennis doesn’t mean as much in “dark” tennis or “broken string” tennis. While unusually heavy balls don’t rank up there with my hypotheticals, the idea is the same: The more you deviate from typical conditions, the less predictable the results.

We measure predictability by taking the Brier score of my Elo-based pre-match forecasts. Elo isn’t perfect, but it’s pretty good, and the algorithm allows us to compare seasons and tournaments against each other. Brier score tells us the calibration of a group of predictions: Were they correct? Did they have the right level of confidence? The lower the score, the better the forecast. Or put another way, for our purposes today: The lower the score, the more predictable the outcomes.

Conclusion: This year’s French wasn’t that weird. Here are the Brier scores for men’s and women’s completed main draw matches, along with several other measures for context:

Tourney(s)     Men  Women  
2023 RG      0.177  0.193  
2022 RG      0.174  0.189  
2021 RG      0.177  0.194  
2020 RG      0.200  0.230  
2000-23 RG   0.169  0.184  
00-23 Slams  0.171  0.182  
Min RG       0.133  0.152  
Max RG       0.214  0.230

(“Min RG” and “Max RG” show the lowest and highest tournament Brier scores for each gender at the French since 2000.)

Again, lower = more predictable. For both men and women, the 2023 French was no more upset-ridden than the 2021 edition, and it ran considerably closer to script than the zany Covid tournament in autumn 2020. The results this year were a bit more unpredictable than the typical major since 2000. But the metrics tell us that the outcomes were closer to the average than to the extremes.

However unusual the conditions at Roland Garros felt to the players, the weirdness didn’t cause the results to be any more random than usual. While adjustments were surely necessary, most players were able to make them, and to similar degrees. The best players–based on their demonstrated clay-court prowess–tended to win, about as often as they always do at the French.

Picking 32 Qualifiers

Australian Open qualifying starts in just a few hours. 128 men and 128 women stand three wins away from a spot in a grand slam main draw. Only 16 of each will remain at the end of the week.

Forecasting is particularly tricky during qualifying. Unlike most tournaments, when the top seeds far outrank the field, there’s little difference between a player on the fringes of the top 100 and one in the middle of the 200s. Andrej Martin, the top seed in the men’s qualifying draw, has the lowest hard-court Elo rating of the eight players in his section!

Let’s run through the 32 eight-player sections. I’ve posted pre-tournament forecasts for men and women. Keep in mind that these numbers don’t (yet) include any results from the week of January 3rd. For most players it doesn’t matter. For a few, like Melbourne semi-finalist Qinwen Zheng, it misses a major ranking boost.

To make things more interesting, let’s compare Elo’s preferences to those of two guys who pay more attention to Challenger-level tennis than I do, Alex Gruskin* and Damian Kust. At the end of the week, we’ll see how the experts fared against the machine. Unless, of course, they make the machine look bad, in which case I’ll delete this post and deny this ever happened.

Men’s qualifying draw

  1. Mikhail Kukushkin. Elo likes the veteran, giving him a 22.9% chance of qualifying. Damian picks NCAA star and 2021 breakout Nuno Borges (Elo: 13.7%), while Alex prefers big-hitting American Ernesto Escobedo (Elo: 16.9%, which will be higher after the algorithm includes EE’s challenger win this week.) Top seed Andrej Martin could hardly be a longer shot.
  2. Mats Moraing (23.9%). Both of our experts like Dominic Stricker (10.8%), the 19-year-old Swiss. Damian acknowledges a bit of wishful thinking here.
  3. Maximilian Marterer (29.6%). Elo prefers alliterative German names. Damian agrees, while Alex goes with the high seed in the section, #3 Daniel Galan (12.7%).
  4. Gilles Simon (34.1%). Gilles Simon is playing grand slam qualifying! Damian and Alex are both too young to remember Simon’s prime, which explains their pick of Tomas Machac (23.4%).
  5. Joao Sousa (31.7%). Damian agrees. Alex boldly picks Geoffrey Blancaneaux (5.7%), the fifth favorite in the section according to Elo.
  6. Jiri Lehecka (23.9%). Another vote of confidence from Damian. Alex picks Michael Mmoh (11.7%) for the first-round upset of the higher-ranked Lehecka.
  7. Salvatore Caruso (28.2%). Shockingly, Alex is finally on board with an Elo pick. Damian prefers the top seed in the section, #7 Taro Daniel (23.2%).
  8. Quentin Halys (21.6%). The most even section we’ve seen so far. Damian concurs, calling him “underrated,” while Alex goes with Yannick Hanfmann (18.1%).
  9. Damir Dzumhur (27.9%). Both of our experts go with Rinky Hijikata (1.1%). Rinky is the hipster pick, but he did get broken four times by Maxime Cressy this week.
  10. Christopher Eubanks (30.5%). I really thought we’d see Alex agree with Elo here, since the algorithm finally picked an American. But no, Gruskin goes with the formerly mulleted JJ Wolf (25.1%). Damian prefers Roman Safiullin (5.8%), the surprise star of Russia’s ATP Cup squad. It worked for Aslan Karatsev
  11. Hugo Grenier (31.8%). Damian agrees, while Alex goes with Juan Pablo Varillas (4.6%), a man who last won a main draw match on hard in 2019 at an ITF M15 in Cancun. Another “bold” pick from the intrepid podcaster.
  12. Jason Kubler (29.9%). We all agree!
  13. Frederico Ferriera Silva (23.4%). Alex goes with basically-tied-as-favorite Mitchell Krueger (23.1%), and Damian goes with a personal fave in Nicola Kuhn (6.8%).
  14. Alexandre Muller (24.1%). Both experts pick Jurij Rodionov (23.7%), the top seed in the section and practically a co-favorite per Elo.
  15. Cem Ilkel (20.6%). Damian correctly pegs this as a very balanced section–Ilkel is the least Elo-favored pick of the 16. Both Damian and Alex go with Zizou Bergs, a likeable player by humans, but apparently not by the machine (8.3%).
  16. Alejandro Tabilo (32.0%). We all agree! I’m guessing both experts were tired at this point, so we all just went with the top seed.

We all agreed on two picks, and we all picked different players in three sections. Of the rest, Damian and Alex voted the same way five times, Damian went with the Elo pick five times, and Alex agreed with Elo once.

Women’s qualifying draw

Damian focuses on the men’s game, so here we have only two sets of forecasts: Elo and Alex Gruskin’s picks, along with a few of my personal preferences where they differ from the algorithm.

The gap between the seeds and field is much greater in the women’s game, hence the much higher probabilities that many of the top seeds (and/or Elo’s choices) reach the main draw.

  1. Anna Kalinskaya (63.8%). Everyone’s on the same page here, even Nick Kyrgios.
  2. Martina Trevisan (47.6%). Alex picks the clear second favorite, Olga Govortsova (27.0%).
  3. Lin Zhu (45.6%). Again, Alex goes with the second fave, Anna Blinkova (25.5%).
  4. Nina Stojanovic (42.2%). I’ll be cheering for Caty McNally (27.7%), even if wouldn’t put my money against Elo. Alex picks another American, Hailey Baptiste (8.4%).
  5. Mariam Bolkvadze (26.1%). Sometimes it seems that Elo is trolling us, like this pick of an unseeded Georgian. Alex goes with Bolkvadze’s first-round opponent, Irina Maria Bara (9.8%), so at least one of the choices will be eliminated quickly.
  6. Lesia Tsurenko (54.7%). Alex agrees. My sentimental fave, as always, is Kathinka von Deichmann (3.7%), who I know better than to actually pick.
  7. Katie Boulter (40.6%). And sometimes it feels like Gruskin is trolling us. In a section with Boulter and Christina McHale (26.8%), he goes with Francesca Di Lorenzo (5.1%).
  8. Kateryna Bondarenko (26.0%). A balanced section, where Alex goes with the top seed, Kamilla Rakhimova. If Damian had projected this draw, he’d surely make a wishful pick of Victoria Jimenez Kasintseva (6.4%), 16-year-old runner-up in Bendigo this week.
  9. Rebeka Masarova (32.8%). I can only assume Alex is drinking heavily by this point, as he picked Kurumi Nara (13.0%) over both Masarova and top seed Sara Errani (28.7%). My only pick is that Errani reaches at least double digits in underhand serves.
  10. Mihaela Buzarnescu (30.3%). Alex picks Jule Niemeier, who at 30.0% is Elo’s co-favorite. I’d love to see Miki launch a comeback in 2022, but she has a tricky first match against Bendigo champ Ysaline Bonaventure, and Niemeier is clearly the rising star here.
  11. Harriet Dart (44.7%). Alex agrees, and in an uninspiring section, I’m guessing some of Harriet’s competitors do too.
  12. Dalma Galfi (35.2%). The second-favorite is Stefanie Voegele (30.3%), and that’s the player both Alex and I expect to see playing in the main draw.
  13. CoCo Vandeweghe (35.0%). It’s an absolute blockbuster of a first-round match (by qualifying standards, anyway) between Vandeweghe and Qinwen Zheng (16.8%). As noted above, Zheng reached the semis in Melbourne, so Elo will think more highly of her as soon as those results are included. It probably won’t swing things all the way in her favor, though–CoCo also reached a semi at the ITF W60 in Bendigo. Meanwhile, Alex is now doing vodka shots and picks Mai Hontama (13.9%).
  14. Aleksandra Krunic (26.3%). Another very even section. Alex goes with Cristina Bucsa (17.2%), while to me it looks like it’s Anna-Lena Friedsam’s (19.3%) main-draw spot to lose.
  15. Elisabetta Cocciaretto (36.7%). Every once in a while someone tries to explain to me how players could manipulate Elo ratings, if it matters. I don’t really buy the argument, but if anyone could game the system, it’s Cocciaretto. She seems to be doing it already. I don’t understand why she’s the favorite here, and I’m not sure I would even pick her in the first rounder against Lara Arruabarrena. Alex goes with the safe pick here, top seed Nao Hibino (20.7%).
  16. Aliona Bolsova (30.2%). Tons of talent in the bottom section, with Viktoria Kuzmova (24.6%), last year’s discovery Francesca Jones (12.1%), and local slugger Destanee Aiava (2.4%). Alex takes the top seed here, Anastasia Gasanova (12.6%).

Qualifying really is anybody’s game. According to my traffic logs, Alex visits my Elo ranking pages even more often than the Russian spambots do, and we still only agree on 3 of 16 picks.

Thanks to Damian and Alex for letting me including their picks here.

* Full disclosure: Alex and I are both members of the board of directors of the Serena Williams Power Tennis Country Club. As tennis insiders, it’s only natural that we have a conflict of interest.

The Best at Getting Better

Here’s a stat you probably didn’t know*. Since the restart, the WTA top five in first-serve points won are Naomi Osaka, Serena Williams, Ashleigh Barty, Jennifer Brady, and … Maria Sakkari.

** unless you’ve been listening to me podcast lately.

The first four names are to be expected: Osaka, Williams, and Barty are probably the top three offensive players in the game, period, and Brady makes her money with big serving. Sakkari is the one who stands out. She does many things well, but I would never have thought to put her in this group, ahead of the likes of Karolina Pliskova, Aryna Sabalenka and, well, everybody else.

Sakkari’s first serve might be the best-kept secret in the women’s game, in large part because it hasn’t been around to keep secret for long. When she started playing tour events, her serve was quite weak, and it has only gradually improved since then. That’s what I marvel at. In six seasons at tour level, all with at least 18 matches played, here are her rates of first-serve points won:

Year     1st Win%  
2016        58.6%  
2017        59.7%  
2018        63.7%  
2019        65.2%  
2020        66.5%  
2021        69.9%

This probably doesn’t need further explanation. Fewer than 60% of first serve points isn’t very good, 70% is excellent, and improving from one to the other is a massive accomplishment. But in case you’re not convinced, here’s the same progression along with percentile rankings, showing that Sakkari started her career better than only 13% of her peers, and this year is outperforming 93% of them:

Year     1st Win%  Percentile  
2016        58.6%          13  
2017        59.7%          20  
2018        63.7%          53  
2019        65.2%          67  
2020        66.5%          79  
2021        69.9%          93

Players can and do improve, but they usually retain the same relative strengths and weaknesses throughout their career. The Greek star has broken that mold, and there’s a natural follow-up question: Has there been anyone else like her?

Meet Kiki

Here’s the simple filter I used to identify players who had substantially improved this aspect of their game. For every player with a full season in which they won fewer than 60% of first-serve points (almost exactly the 20th percentile), I identified those who eventually recorded a full-season in the top half of WTA players, roughly 63.3% or better.

From 2010 to 2021–yes, an awfully short span, due to the limited availability of historical WTA match stats–112 different players posted a sub-60% season. 26 of them went on to an above-average year. One example is Carla Suarez Navarro, who won 59.0% of first-serve points in 2010, and peaked at 64.0% (56th percentile) in 2016. That’s a respectable progression, but far from Sakkari’s standard.

Here are the 10 players who improved on a sub-60% season to eventually manage a season of 65% or better, ranked by the best level they attained:

Player       Weak   1st%  %ile  Strong   1st%  %ile  
K Bertens    2015  59.5%    18    2019  71.9%    97  
M Sakkari    2016  58.6%    13    2021  69.9%    93  
D Kasatkina  2017  59.0%    15    2021  66.4%    78  
S Halep      2012  56.4%     3    2014  66.4%    78  
Y Shvedova   2011  59.4%    17    2016  66.1%    75  
A Cornet     2011  58.9%    14    2020  66.1%    75  
M Linette    2016  59.9%    21    2020  65.8%    73  
Y Wickmayer  2012  60.0%    22    2017  65.8%    72  
A Sasnovich  2016  58.4%    11    2018  65.1%    67  
S Stephens   2011  59.7%    19    2015  65.0%    66

Kiki Bertens wasn’t quite as bad as Sakkari at her worst, but she wasn’t getting much benefit from her first serve. Like the Greek, she had back-to-back seasons below 60%, but unlike Sakkari, her improvement was instant. She leapt from sub-60% in 2015 to almost 68% (86th percentile) a year later. You won’t be surprised to hear that her ranking catapulted upwards as well, from 104th at the end of 2015 to 22nd a year later.

Kiki’s several years since also bode well for Sakkari. Her first-serve winning percentage of 67.4% last year was her worst since crossing the 60% barrier. A slightly less optimistic story comes from Simona Halep, whose 78th percentile mark in 2014 remains her career best. Coming from such an abysmal starting point, it’s remarkable that Halep has improved as much as she has, but she remains firmly in the range of good-but-not-great in this dimension of her game.

Steady improvements

There’s no particular advantage to spreading out one’s gains over a half-decade, like Sakkari has. If she had been given the option of picking up eight percentage points in a single year, like Bertens did, she would’ve taken it.

Still, the fact that the Greek keeps marching upwards is what makes her ascent so fascinating to me. In the decade-plus of data available, no other woman has improved her first-serve win percentage for five years running. Only two players–Yulia Putintseva and Saisai Zheng–have enjoyed positive bumps for four consecutive seasons, and neither situation really compares. Zheng’s improvement took her from 53.2% in 2015 to 59.3% in 2019, and Putintseva rose from 57.9% in 2017 to 62.4% so far this year. While both are making the most of what they have, neither has fundamentally transformed the type of threat they bring on court the way that Sakkari has.

In search of a better comparison–any comparison–with this five-year streak of gains, I turned to the more extensive set of ATP match stats, which go back to 1991. In those three decades, I found exactly 10 players who improved in this department for five (or more) consecutive years. It’s a decidedly diverse group, with a few names you might recognize:

Player            Streak  Start %ile  End %ile  
Renzo Furlan           6           2        73  
Slava Dosedel          6           2        16  
Julien Benneteau       5          16        55  
Arnaud Clement         6          18        70  
Michael Chang          5          18        92  
Roger Federer          5          47        94  
Thomas Enqvist         5          58        94  
Boris Becker           6          79        99  
John Isner             7          82        98  
Marc Rosset            5          87        98 

The starting and ending percentiles indicate that this list includes players who began bad and ended a bit less bad, servebots who started great and eked even more out of their biggest weapon, and then a handful of Sakkari-esque figures who steadily went from considerably below average to far above it.

Michael Chang is the closest parallel of the group, even if we don’t have complete match stats for the first few years of his career. In 1991 he was one of the best returners in the game, but winning barely two thirds of his first serve points wasn’t enough to keep him in the top ten in an offense-dominated era. Five years later he was winning 77% of his first deliveries and ended the season at his peak ranking of #2. He couldn’t sustain the elite-level serving stats, but he did have a few more above-average years.

And then there’s Roger Federer. I’ll leave it to Sakkari fans to work out whether his presence on this list can tell us anything about her future.

Ave Maria

This is all just a long way of saying “wow!” There are other aspects of Sakkari’s game that she has improved, though none so consistently and dramatically. Once you start looking at year-to-year trends for individual stats, future projects start to multiply: identifying peak ages for different parts of the game, determining which stats are more or less likely to regress to the mean, finding which ones best predict ranking climbs, and so on.

We’ll get to some of those answers eventually. In the meantime, I’ll be watching Sakkari with new, better-informed eyes.

So, About Those Stale Rankings

Both the ATP and WTA have adjusted their official rankings algorithms because of the pandemic. Because many events were cancelled last year (and at least a few more are getting canned this year), and because the tours don’t want to overly penalize players for limiting their travel, they have adopted what is essentially a two-year ranking system. For today’s purposes, the details don’t really matter–the point is that the rankings are based on a longer time frame than usual.

The adjustment is good for people like Roger Federer, who missed 14 months and is still ranked #6. Same for Ashleigh Barty, who didn’t play for 11 months yet returned to action in Australia as the top seed at a major. It’s bad for young players and others who have won a lot of matches lately. Their victories still result in rankings improvements, but they’re stuck behind a lot of players who haven’t done much lately.

The tweaked algorithms reflect the dual purposes of the ranking system. On the one hand, they aim to list the best players, in order. On the other hand, they try to maintain other kinds of “fairness” and serve the purposes of the tours and certain events. The ATP and WTA computers are pretty good at properly ranking players, even if other algorithms are better. Because the pandemic has forced a bunch of adjustments, it stands to reason that the formulas aren’t as good as they usually are at that fundamental task.

Hypothesis

We can test this!

Imagine that we have a definitive list, handed down from God (or Martina Navratilova), that ranks the top 100 players according to their ability right now. No “fairness,” no catering to the what tournament owners want, and no debates–this list is the final word.

The closer a ranking table matches this definite list, the better, right? There are statistics for this kind of thing, and I’ll be using one called the Kendall rank correlation coefficient, or Kendall’s tau. (That’s the Greek letter τ, as in Τσιτσιπάς.) It compares lists of rankings, and if two lists are identical, tau = 1. If there is no correlation whatsoever, tau = 0. Higher tau, stronger relationship between the lists.

My hypothesis is that the official rankings have gotten worse, in the sense that the pandemic-related algorithm adjustments result in a list that is less closely related to that authoritative, handed-down-from-Martina list. In other words, tau has decreased.

We don’t have a definitive list, but we do have Elo. Elo ratings are designed for only one purpose, and my version of the algorithm does that job pretty well. For the most part, my Elo formula has not changed due to the pandemic*, so it serves as a constant reference point against which we can compare the official rankings.

* This isn’t quite true, because my algorithm usually has an injury/absence penalty that kicks in after a player is out of action for about two months. Because the pandemic caused all sorts of absences for all sorts of reasons, I’ve suspended that penalty until things are a bit more normal.

Tau meets the rankings

Here is the current ATP top ten, including Elo rankings:

Player       ATP  Elo  
Djokovic       1    1  
Nadal          2    2  
Medvedev       3    3  
Thiem          4    5  
Tsitsipas      5    6  
Federer        6    -  
Zverev         7    7  
Rublev         8    4  
Schwartzman    9   10  
Berrettini    10    8

I’m treating Federer as if he doesn’t have an Elo rating right now, because he hasn’t played for more than a year. If we take the ordering of the other nine players and plug them into the formula for Kendall’s tau, we get 0.778. The exact value doesn’t really tell you anything without context, but it gives you an idea of where we’re starting. While the two lists are fairly similar, with many players ranked identically, there are a couple of differences, like Elo’s higher estimate of Andrey Rublev and its swapping of Diego Schwartzman and Matteo Berrettini.

Let’s do the same exercise with a bigger group of players. I’ll take the top 100 players in the ATP rankings who met the modest playing time minimum to also have a current Elo rating. Plug in those lists to the formula, and we get 0.705.

This is where my hypothesis falls apart. I ran the same numbers on year-end ATP rankings and year-end Elo ratings all the way back to 1990. The average tau over those 30-plus years is about 0.68. In other words, if we accept that Elo ratings are doing their job (and they are indeed about as predictive as usual), it looks like the pandemic-adjusted official rankings are better than usual, not worse.

Here’s the year-by-year tau values, with a tau value based on current rankings as the right-most data point:

And the same for the WTA, to confirm that the result isn’t just a quirk of the makeup of the men’s tour:

The 30-year average for women’s rankings is 0.723, and the current tau value is 0.764.

What about…

You might wonder if the pandemic is wreaking some hidden havoc with the data set. Remember, I said that I’m only considering players who meet the playing time minimum to have an Elo rating. For this purpose, that’s 20 matches over 52 weeks, which excludes about one-third of top-100 ranked men and closer to half of top-100 women. The above calculations still consider 100 players for year-end 2020 and today, but I had to go deeper in the rankings to find them. Thus, the definition of “top 100” shifts a bit from year-end 2019 to year-end 2020 to the present.

We can’t entirely address this problem, because the pandemic has messed with things in many dimensions. It isn’t anything close to a true natural experiment. But we can look only at “true” top-100 players, even if the length of the list is smaller than usual for current rankings. So instead of taking the top 100 qualifying players (those who meet a playing time minimum and thus have an Elo ranking), we take a smaller number of players, all of whom have top-100 rankings on the official list.

The results are the same. For men, the tau based on today’s rankings and today’s Elo ratings is 0.694 versus the historical average of 0.678. For women, it’s 0.721 versus 0.719.

Still, the rankings feel awfully stale. The key issue is one that Elo can’t help us solve. So far, we’ve been looking at players who are keeping active. But the really out-of-date names on the official lists are the ones who have stayed home. Should Federer still be #6? Heck if I know! In the past, if an elite player missed 14 months, Elo would knock him down a couple hundred points, and if that adjustment were applied to Fed now, it would push down tau. But there’s no straightforward answer for how the inactive (or mostly inactive) players should be rated.

What we’ve learned today

This is the part of the post where I’m supposed to explain why this finding makes sense and why we should have suspected it all along. I don’t think I can manage that.

A good way to think about this might be that there is a sort of tour-within-a-tour that is continuing to play regularly. Federer, Barty, and many others haven’t usually been part of it, while several dozen players are competing as often as they can. The relative rankings of that second group are pretty good.

It doesn’t seem quite fair that Clara Tauson is stuck just inside the top 100 while her Elo is already top-50, or that Rublev remains behind Federer despite an eye-popping six months of results while Roger sat at home. And for some historical considerations–say, weeks inside the top 50 for Tauson or the top 5 for Rublev–maybe it isn’t fair that they’re stuck behind peers who are choosing not to play, or who are resting on the laurels of 18-month-old wins.

But in other important ways, the absolute rankings often don’t matter. Rublev has been a top-five seed at every event he’s played since late September except for Roland Garros, the Tour Finals, and the Australian Open, despite never being ranked above #8. When the tour-within-a-tour plays, he is a top-five guy. The likes of Rublev and Tauson will continue to have the deck slightly stacked against them at the majors, but even that disadvantage will steadily erode if they continue to play at their current levels.

Believing in science as I do, I will take these findings to heart. That means I’ll continue to complain about the problems with the official rankings–but no more than I did before the pandemic.