20 > 21 > 20

Rafael Nadal has finally nosed his way into the lead. With his Australian Open title yesterday, he became the first man to 21 major singles titles, breaking away from the three-way tie at 20 with Novak Djokovic and Roger Federer.

For some people, leading the all-time grand slam race is enough to cement a player as the greatest of all time. A different crowd considers this year’s Australian Open tainted because Djokovic was not allowed to play. Still others think that Federer played some beautiful tennis, and they considered the matter concluded at least five years ago.

I belong to a fourth camp, which I can summarize with two positions:

  1. The grand slam race isn’t everything.
  2. If you do focus on grand slams, you must adjust the major count for the quality of opponents each player faced.

I’ve written about this before, first at The Economist, and then here at the blog. When I checked in 18 months ago, Nadal’s 20 majors were worth a bit more than Djokovic’s 17, which were themselves more impressive than Federer’s 20. The margins have always been slim between these three, and properly adjusting for quality of opponents makes things even tighter.

The update

Here’s how the adjustment works. For each slam that a player won, we take the Elo rating of all of his opponents, and work out the probability that the average Open Era grand slam winner would beat all of them. Once we have that number–which centers around 23%–we normalize it so that the value of an “average” major is 1.0.

When a major title requires facing down a lot of tough opponents, its rating is higher than 1.0, while a relatively easy one rates below 1.0. In the last few years, the numbers have drifted downward, because while the familiar names keep winning quite a bit, they haven’t needed to face each other as often as they used to.

You might disagree with the methodology, and that’s fine. But I find that most people end up making some sorts of adjustments, even if they shy away from stats or only tweak the totals when it favors their idol. Some Djokovic fans want to downplay Nadal’s recent win, and it’s true that Novak’s absence lowered the quality of the draw. But surely Rafa’s title isn’t worth zero. He beat many excellent players, and there was no guarantee that Novak would advance through the draw–or that Rafa would lose if they met.

This approach allows us to avoid specific minefields and answer all the analogous questions about every slam. Considering the seven opponents that Nadal faced, his Melbourne title rates at 0.84, weaker than average, but more difficult than seven of his prior titles. Djokovic has not enjoyed as many “easy” paths to major titles, but his Wimbledon victory last summer rates at a mere 0.60, the second-weakest of his career and lower than all but one of Rafa’s. Sometimes players just get lucky, with or without a geopolitical brouhaha.

Nadal’s 21st title rates only a bit lower than Djokovic’s two other titles last year: 0.90 at the Australian and 0.93 at the French.

Here are the updated rankings for “adjusted slams,” along with a table showing how many easy, medium, and hard paths that the Big Three have endured:

Player    Slams  Avg Score  Total  
Nadal        21       0.95   19.9  
Djokovic     20       1.01   20.1  
Federer      20       0.89   17.9  
                                   
Player     Easy     Medium   Hard  
Nadal         8          8      5  
Djokovic      6          7      7  
Federer       9         10      1

As if 21 and 20 weren’t close enough, this approach gives Djokovic 20.1 adjusted slams to Nadal’s 19.9. Again, you don’t have to agree with every step of my approach here to accept that we often think in terms of these kind of adjustments, and that Djokovic has–on average–faced tougher roads to titles than Nadal, while Federer had it easier than both of them.

Players can’t control who they face, but as fans, we can appreciate who worked the hardest to achieve near-equivalent feats. Fingers crossed that both Novak and Rafa excel at Roland Garros, so they can fight it out on the court, not in some random guy’s spreadsheets.

Aslan Karatsev Isn’t Better Than Novak Djokovic, But…

What’s better, winning 15 of 17 matches, or going undefeated for 9?

Even if you know that the 15-2 guy is Aslan Karatsev in 2021, and the 9-0 guy is Novak Djokovic this year, there’s no obvious answer. Sure, Djokovic beat Karatsev easily, and Novak’s nine wins included a grand slam title. We know Djokovic is the better player–he’s got more than a decade of proof to support that claim–and no one in their right mind would take Karatsev’s last three months over Novak’s.

True as all of that is, it’s not the question I’m asking.

The player with the 15-2 record has two advantages over his 9-0 peer. First, he has more wins. (Mind-blowing stuff, I know.) Second and more importantly, he has more evidence of his current level, even if it includes two losses. The 9-0 guy could go undefeated for 17 matches… but he could also end up 11-6. His nine-match record simply doesn’t give us as much information.

Again, if you know which players I’m talking about, that doesn’t matter–we have 1,100 matches worth of information about Djokovic, most of which say that his 9-0 is business as usual. He might not win his next eight matches, but he’s certainly not going to lose more than a few of them.

The yElo light at the end of the tunnel

If you’ve been reading my last couple of posts, you know where I’m going with this.

Last week, I introduced the concept of yElo. The “y” stands for year, but it can be used for any unit of time shorter than an entire career. Instead of using every bit of available information, we look only at a designated time frame, such as the 2021 season. While maintaining our knowledge of other players (e.g. Andrey Rublev is a really tough opponent; Egor Gerasimov not so much), we treat each player as if we know nothing else about him.

So truly, we’re comparing Karatsev’s 15-2 with Djokovic’s 9-0, taking into account the quality of their competition.

Plug every ATPer’s 2021 season into the formula, and here are the yElo leaders, through last weekend’s finals in Dubai and Acapulco:

Rank  Player                  W-L  yElo  
1     Aslan Karatsev         15-2  2082  
2     Novak Djokovic          9-0  2081  
3     Daniil Medvedev        13-2  2061  
4     Andrey Rublev          15-3  2006  
5     Marton Fucsovics       14-4  2000  
6     Stefanos Tsitsipas     14-4  1983  
7     Alexander Zverev        9-4  1922  
8     Matteo Berrettini       8-2  1918  
9     Jeremy Chardy          13-6  1915  
10    Lloyd Harris           11-5  1878  
11    Jannik Sinner           9-4  1848  
12    Alexei Popyrin          9-3  1836  
13    Roberto Bautista Agut   8-7  1831  
14    Taylor Fritz            7-4  1830  
15    Sebastian Baez         14-1  1820  
16    Felix Auger Aliassime   8-4  1818  
17    Karen Khachanov         9-5  1810  
18    Mackenzie McDonald     11-5  1809  
19    Tomas Machac           10-3  1806  
20    Daniel Evans            6-3  1800

Yes, Karatsev really does outscore Djokovic. Barely.

We are accustomed to 52-week rankings and Elo ratings that carefully weigh an entire career’s worth of work. So this is a deeply weird list, with only a handful of players anywhere near where we’d expect. #15 and #19 are Challenger-level guys, for crying out loud!

Embrace the race

The official Race to Turin doesn’t look as bizarre as the yElo list, but imagine showing it to someone in December, with Karatsev 5th, Marton Fucsovics 7th, and Rafael Nadal outside the top 20. Both the Race and the yElo list are “wrong” in the traditional sense, but they tell us much more about the 2021 season than the old-fashioned rankings do.

Tennis’s relentless focus on the long view sucks some excitement out of the season. Think of virtually any team sport. A month into the season, some unheralded club has gotten off to a hot start, and at least in some quarters, that’s the story–can they keep it up? should we have seen this coming all along? Nobodies are cast in the role of front-runners, and established stars play the part of underdogs.

In tennis, nobodies are… well, nobodies who won a few matches lately. Superstars play the part of superstars who’ve been taking some time off. Sure, we know that Djokovic and Nadal are going to end up near the top of the rankings list in November, just like we know the Dodgers and Yankees will be in the playoffs. But that doesn’t mean we ought to take it as a foregone conclusion from day one. In baseball, as the saying goes, everybody’s in first place on Opening Day.

Embracing the race–focusing on which players are leading the pack at each point throughout the season–doesn’t have to mean throwing away longer-term rankings. The traditional calculations should still be used for tournament entries and (maybe) for seedings. Top players have earned as much, and tournament entry is a factor that isn’t present in the major team sports.

Everybody wants to know how the ATP will survive when the Big Three are out of the picture. Well, this is a start–pay attention to who’s winning in 2021. If we take yElo’s word for it, a virtual nobody emerged to overtake Djokovic for the #1 spot going into Miami! An Argentinian prospect is playing like a top-15 guy just by winning a bunch of Challengers! Jeremy Chardy is more than just a hitting partner for the other Frenchmen!

The stories are out there, just like they are every year. It’s a shame that they get buried by all the talk about players who won last year.

I’ve added men’s and women’s yElo ratings to the Tennis Abstract website, and they’ll be updated weekly.

The Best 22-Match yElo Streaks

Earlier this week I wrote about Garbine Muguruza’s outstanding start to the season, and I introduced a new method to quantify a player’s level in a relatively short time span. Instead of using traditional Elo, which takes into account everything we know about a player, my new metric, yElo, uses what we know about everyone else, but treats a player’s short-term performance as if it is all we know about her. The parameters for yElo, such as k-value, are the same as the ones I’ve arrived at to make “regular Elo” as predictive as possible.

In other words, we measure Muguruza’s 22 matches in 2021 as if she had never played a WTA event before. As we saw in my earlier post, this approach considers the strength of opponents each player faced, and it rates her 18-4 record as better than anyone else in 2021, including Naomi Osaka’s 10-0 start.*

* excluding walkovers, which I ignore for all versions of Elo and yElo.

Muguruza’s season start has been outstanding and it is definitely underrated by the official WTA rankings and maybe even by the race, but I don’t want to make too much of it–one title in five tournaments in hardly world-historical stuff. On the other hand, it’s a good way to get our feet wet with a new metric that I think will prove useful for a wide range of tennis comparisons.

Garbine vs Garbine

The Spaniard won majors in 2016 and 2017, and she briefly reached number one in the rankings in September of 2017. Those achievements belong on a Hall of Fame plaque over her recent Dubai title and Yarra River Classic final. But was she really playing better back then?

She was not! I ran the yElo formula for every 22-match sequence in Muguruza’s career. The best of the bunch–again, taken entirely out of context, as if we know nothing beyond those 22 matches–was a run late in 2015 when she reached the Wuhan final, won Beijing, then went undefeated in the WTA Finals round robin stage. Her yElo based on those 22 matches was 2172, narrowly better than her 2021 yElo of 2160.

The more memorable moments of her career don’t quite stack up:

Elo   W-L   Span                            
2172  17-5  2015 Wim R16 - WTA Finals RR    
2160  18-4  2021 Abu Dhabi R64 - Dubai F    
2148  18-4  2017 Birmingham R32 - Cinci F   
2122  19-3  2017 Wimb R128 - USO R16 (#1)   
2084  17-5  2017 Miami R64 - Wimb F         
2076  16-6  2016 Doha QF - Roland Garros F 

I haven’t shown every 22-match sequence of her career, because that list is long and boring–the streaks heavily overlap with each other, and thus there are often tiny differences between them. But it is instructive to look at the time periods that ended at key moments.

The best of that bunch was the 22-match run ending with Muguruza’s 6-1 6-0 beatdown of Simona Halep at the 2017 Cincinnati final. That set the stage for her ascent to #1, though the ranking move didn’t happen until after the US Open. That streak is close to her current level. The 22 matches leading up to the official #1 takeover are a bit lower (she lost to Petra Kvitova at the US Open, which was less forgivable then than now), and the timespans ending with her two slam finals are still further down the list.

Don’t misunderstand–Muguruza was playing very well throughout all of these time periods. But when we crunch the numbers, we find that her current level is roughly on par with the best she’s ever played.

Garbine vs the world

Metrics are a lot more informative once we gain some context. Many of you probably have a good sense of what regular Elo ratings mean–2100+ is outstanding, 2000+ is top ten-ish, 1900+ is approximately the top 20, and so on. We can piggyback on that for yElo. When Muguruza’s 22-match yElo this season is 2160, it really does mean that, when feeding that very limited set of results into the Elo formula, it thinks Muguruza’s level is close to that of the best player in the world.

Well… the best player in the world right now. There’s no truly dominant force in women’s tennis at the moment, so we’re not seeing players at the top end of the all-time Elo scale. In regular Elo, peak Martina Navratilova and peak Steffi Graf topped 2600, more than 400 points above Osaka’s current rating of 2189. It will not surprise you, then, to learn that Navratilova, Graf, Serena Williams, Chris Evert, and many others put together 22-match runs* that make Muguruza’s 2021 season look positively pedestrian.

* yes, I know how ridiculous it is that this whole article is based on the arbitrary 22-match time span. We could do the same stuff with the more natural-sounding 20-match span, but there wouldn’t be an intuitive way to fit Muguruza’s current run into the discussion. And let’s face it, 20 is just as arbitrary as 22.

Out of my entire database on women’s tennis results going back to 1950 or so, about 100 women have enjoyed a 22-match run that outscores Muguruza’s best. The top of the list is the end of Navratilova’s 1983 season, which is worth a yElo of 2445. Close behind is Monica Seles, who reached 2438 with a streak starting at the end of 1992 and extending into the 1993 season. Three more women topped 2400, another 27 exceeded 2300, and 46 more put together 22 consecutive matches worth at least 2200.

Here are the 15 active women who’ve played at least as well as Muguruza for their best 22-match spans:

yElo  Player                W-L   Year(s)  
2389  Serena Williams       21-1  2001-02  
2386  Venus Williams        22-0  2000     
2335  Kim Clijsters         20-2  2002-03  
2332  Victoria Azarenka     22-0  2012     
2234  Vera Zvonareva        18-4  2008     
2217  Svetlana Kuznetsova   19-3  2004     
2217  Naomi Osaka           20-2  2019-20  
2209  Samantha Stosur       20-2  2010     
2205  Petra Kvitova         19-3  2011-12  
2205  Simona Halep          20-2  2018     
2196  Caroline Garcia       18-4  2017     
2186  Ashleigh Barty        19-3  2019     
2180  Angelique Kerber      18-4  2015-16  
2174  Carla Suarez Navarro  18-4  2015     
2172  Garbine Muguruza      17-5  2015

With the caveat that I haven’t spent much of my life thinking about the best 22-match runs in women’s tennis history, this seems like a credible list. I particularly like how yElo manages to consider strength of opponent to the point that an 18-4 run*, like Zvonareva’s in 2008, can outrank so many 20-2s. (Vera even beats a few 22-0s from the amateur era.)

* the link shows a few extra matches–the 18-4 run starts in the QFs of Guangzhou and ends in the Tour Finals semi-final. Note again that yElo skips retirements.

I hope you find the new yElo metric as interesting as I do. I’ll definitely be doing more with it, since I suspect it has value even outside the narrow context of one player and a single timespan of arbitrary lenth.

Repurposing Elo for Streaks, Seasons, and Garbine Muguruza

Elo is a fantastic tool for its explicit purpose: estimating the skill level of players based on available information. For instance, my WTA ratings currently rank Ashleigh Barty second. That seems plausible enough–it may be correct to give her the edge in a head-to-head matchup with everyone on tour except for Naomi Osaka. But with women pursuing such different schedules this season, a rating is only so useful.

For all of Barty’s or Osaka’s skill, is it right to say either one of them has had a better 2021 season than Garbine Muguruza? Osaka won the Australian Open, so she has a valid claim. Barty’s argument is a lot more tenuous, based on only eight victories. The Spaniard’s case writes itself–only a handful of players are up to double digits in wins this year, and Muguruza already has 18. How could we decide? If Elo is the smart version of the official rankings, what’s the smart version of the official race?

Starting fresh

The Elo algorithm itself offers a solution. A big part of the reason Muguruza is rated 4th on my current Elo list–and not higher–is her career before 2021. We had hundreds of matches worth of data on Garbine before January 1st, and it would be silly to throw all that away. Her 18-4 start is fantastic, but it doesn’t supersede everything that came before. It just gives us reason to update our rating.

Here’s where the ranking/race analogy is useful. The official rankings use a time span of 52 weeks (or more). The race restarts on January 1st. We could do the exact same thing with Elo, throwing away all results from the previous year and starting over, but that would be wasteful–it wouldn’t allow us to take into account whether players had faced particularly easy or tough draws, for instance.

The solution is to set Elo ratings back to zero (or 1500, in Elo parlance) one player at a time.

Take Muguruza. Instead of starting the year with a rating of 1981 and a history of several hundred matches, we pretend to know nothing about her. We give her a newbie’s rating of 1500 and a history of zero matches. Then we run the Elo algorithm to update her rating over the course of her 22 matches. First she faces Kristina Mladenovic (with her actual rating at the time of 1817), and improves to 1605. Then she beats Aliaksandra Sasnovich (and her rating of 1805), and improves to 1692. Repeat for each of her 2021 results, and the end result is a rating of 2160–almost 100 points higher than her current “real Elo” rating and within shouting distance of Osaka’s 2189.

To compare players, work through the same steps for everybody else, calculating their current-season rating as if they played their first career match in January.

It’s worth taking a moment to think about exactly what we’re measuring. That outstanding 2160 rating is what you get if a complete unknown shows up with zero match experience, then goes on the 22-match run that has been Muguruza’s season so far. The difference between real-Garbine and fake-newbie-Garbine is that the real one has an extensive track record that tells us she’s always been good–but that she probably isn’t quite this good.

I call it … yElo

This approach is “Elo for seasons” or “year Elo”–yElo*. It doesn’t have to be limited to calendar years, as the same approach would be useful to comparing, say, 20-match segments. It allows us to take advantage of the Elo algorithm–and the well-informed ratings of other players–to measure partial careers.

* you can pronounce it like the color “yellow,” but I prefer to say it like Phil Dunphy from Modern Family answering the phone.

Muguruza’s 2160 rating sure looks good, so how does it stack up against the rest of the tour? Here’s the 2021 top 20, considering players with at least five match wins through the Dubai and Guadalajara finals last weekend:

Rank  Player                W-L  yElo  
1     Garbine Muguruza     18-4  2160  
2     Naomi Osaka          10-0  2094  
3     Jessica Pegula       15-5  2002  
4     Serena Williams       8-1  1997  
5     Elise Mertens        11-2  1971  
6     Karolina Muchova      7-1  1953  
7     Aryna Sabalenka      11-4  1943  
8     Iga Swiatek          10-3  1941  
9     Daria Kasatkina      10-4  1910  
10    Barbora Krejcikova   10-5  1905  
11    Shelby Rogers         9-4  1902  
12    Jil Teichmann         9-5  1899  
13    Anett Kontaveit       9-4  1897  
14    Jennifer Brady        9-4  1892  
15    Cori Gauff           11-5  1885  
16    Danielle Collins      9-4  1883  
17    Ashleigh Barty        8-2  1878  
18    Sara Sorribes Tormo   9-2  1867  
19    Ann Li                5-1  1864  
20    Simona Halep          6-2  1854 

Like any Race list in March, this isn’t really reflective of skill. But when we consider the small amount of data it has to work with for each player, it’s … pretty good?

Again, you can quibble over whether Osaka or Muguruza has had the better season, but this approach weighs the better winning percentage and stronger average opponent against the much higher absolute win count and gives us a credible answer. Muguruza’s additional evidence of good tennis playing puts her ahead of Osaka’s evidence of short-term unbeatability.

While yElo is basically just a toy–it certainly doesn’t have the same predictive value as regular Elo–this initial look makes me like it. The possibilities are endless, from more sophisticated race tracking, to ranking the greatest seasons of all time, to comparing a player’s current hot streak to what’s she’s done in the past. Stay tuned, as I’m sure I’ll have more yElo results to report in the future.

So, About Those Stale Rankings

Both the ATP and WTA have adjusted their official rankings algorithms because of the pandemic. Because many events were cancelled last year (and at least a few more are getting canned this year), and because the tours don’t want to overly penalize players for limiting their travel, they have adopted what is essentially a two-year ranking system. For today’s purposes, the details don’t really matter–the point is that the rankings are based on a longer time frame than usual.

The adjustment is good for people like Roger Federer, who missed 14 months and is still ranked #6. Same for Ashleigh Barty, who didn’t play for 11 months yet returned to action in Australia as the top seed at a major. It’s bad for young players and others who have won a lot of matches lately. Their victories still result in rankings improvements, but they’re stuck behind a lot of players who haven’t done much lately.

The tweaked algorithms reflect the dual purposes of the ranking system. On the one hand, they aim to list the best players, in order. On the other hand, they try to maintain other kinds of “fairness” and serve the purposes of the tours and certain events. The ATP and WTA computers are pretty good at properly ranking players, even if other algorithms are better. Because the pandemic has forced a bunch of adjustments, it stands to reason that the formulas aren’t as good as they usually are at that fundamental task.

Hypothesis

We can test this!

Imagine that we have a definitive list, handed down from God (or Martina Navratilova), that ranks the top 100 players according to their ability right now. No “fairness,” no catering to the what tournament owners want, and no debates–this list is the final word.

The closer a ranking table matches this definite list, the better, right? There are statistics for this kind of thing, and I’ll be using one called the Kendall rank correlation coefficient, or Kendall’s tau. (That’s the Greek letter τ, as in Τσιτσιπάς.) It compares lists of rankings, and if two lists are identical, tau = 1. If there is no correlation whatsoever, tau = 0. Higher tau, stronger relationship between the lists.

My hypothesis is that the official rankings have gotten worse, in the sense that the pandemic-related algorithm adjustments result in a list that is less closely related to that authoritative, handed-down-from-Martina list. In other words, tau has decreased.

We don’t have a definitive list, but we do have Elo. Elo ratings are designed for only one purpose, and my version of the algorithm does that job pretty well. For the most part, my Elo formula has not changed due to the pandemic*, so it serves as a constant reference point against which we can compare the official rankings.

* This isn’t quite true, because my algorithm usually has an injury/absence penalty that kicks in after a player is out of action for about two months. Because the pandemic caused all sorts of absences for all sorts of reasons, I’ve suspended that penalty until things are a bit more normal.

Tau meets the rankings

Here is the current ATP top ten, including Elo rankings:

Player       ATP  Elo  
Djokovic       1    1  
Nadal          2    2  
Medvedev       3    3  
Thiem          4    5  
Tsitsipas      5    6  
Federer        6    -  
Zverev         7    7  
Rublev         8    4  
Schwartzman    9   10  
Berrettini    10    8

I’m treating Federer as if he doesn’t have an Elo rating right now, because he hasn’t played for more than a year. If we take the ordering of the other nine players and plug them into the formula for Kendall’s tau, we get 0.778. The exact value doesn’t really tell you anything without context, but it gives you an idea of where we’re starting. While the two lists are fairly similar, with many players ranked identically, there are a couple of differences, like Elo’s higher estimate of Andrey Rublev and its swapping of Diego Schwartzman and Matteo Berrettini.

Let’s do the same exercise with a bigger group of players. I’ll take the top 100 players in the ATP rankings who met the modest playing time minimum to also have a current Elo rating. Plug in those lists to the formula, and we get 0.705.

This is where my hypothesis falls apart. I ran the same numbers on year-end ATP rankings and year-end Elo ratings all the way back to 1990. The average tau over those 30-plus years is about 0.68. In other words, if we accept that Elo ratings are doing their job (and they are indeed about as predictive as usual), it looks like the pandemic-adjusted official rankings are better than usual, not worse.

Here’s the year-by-year tau values, with a tau value based on current rankings as the right-most data point:

And the same for the WTA, to confirm that the result isn’t just a quirk of the makeup of the men’s tour:

The 30-year average for women’s rankings is 0.723, and the current tau value is 0.764.

What about…

You might wonder if the pandemic is wreaking some hidden havoc with the data set. Remember, I said that I’m only considering players who meet the playing time minimum to have an Elo rating. For this purpose, that’s 20 matches over 52 weeks, which excludes about one-third of top-100 ranked men and closer to half of top-100 women. The above calculations still consider 100 players for year-end 2020 and today, but I had to go deeper in the rankings to find them. Thus, the definition of “top 100” shifts a bit from year-end 2019 to year-end 2020 to the present.

We can’t entirely address this problem, because the pandemic has messed with things in many dimensions. It isn’t anything close to a true natural experiment. But we can look only at “true” top-100 players, even if the length of the list is smaller than usual for current rankings. So instead of taking the top 100 qualifying players (those who meet a playing time minimum and thus have an Elo ranking), we take a smaller number of players, all of whom have top-100 rankings on the official list.

The results are the same. For men, the tau based on today’s rankings and today’s Elo ratings is 0.694 versus the historical average of 0.678. For women, it’s 0.721 versus 0.719.

Still, the rankings feel awfully stale. The key issue is one that Elo can’t help us solve. So far, we’ve been looking at players who are keeping active. But the really out-of-date names on the official lists are the ones who have stayed home. Should Federer still be #6? Heck if I know! In the past, if an elite player missed 14 months, Elo would knock him down a couple hundred points, and if that adjustment were applied to Fed now, it would push down tau. But there’s no straightforward answer for how the inactive (or mostly inactive) players should be rated.

What we’ve learned today

This is the part of the post where I’m supposed to explain why this finding makes sense and why we should have suspected it all along. I don’t think I can manage that.

A good way to think about this might be that there is a sort of tour-within-a-tour that is continuing to play regularly. Federer, Barty, and many others haven’t usually been part of it, while several dozen players are competing as often as they can. The relative rankings of that second group are pretty good.

It doesn’t seem quite fair that Clara Tauson is stuck just inside the top 100 while her Elo is already top-50, or that Rublev remains behind Federer despite an eye-popping six months of results while Roger sat at home. And for some historical considerations–say, weeks inside the top 50 for Tauson or the top 5 for Rublev–maybe it isn’t fair that they’re stuck behind peers who are choosing not to play, or who are resting on the laurels of 18-month-old wins.

But in other important ways, the absolute rankings often don’t matter. Rublev has been a top-five seed at every event he’s played since late September except for Roland Garros, the Tour Finals, and the Australian Open, despite never being ranked above #8. When the tour-within-a-tour plays, he is a top-five guy. The likes of Rublev and Tauson will continue to have the deck slightly stacked against them at the majors, but even that disadvantage will steadily erode if they continue to play at their current levels.

Believing in science as I do, I will take these findings to heart. That means I’ll continue to complain about the problems with the official rankings–but no more than I did before the pandemic.

How Much Does Naomi Osaka Raise Her Game?

You’ve probably heard the stat by now. When Naomi Osaka reaches the quarter-final of a major, she’s 12-0. That’s unprecedented, and it’s especially unexpected from a player who doesn’t exactly pile up hardware outside of the hard court grand slams.

It sure looks like Osaka finds another level as she approaches the business end of a major. Translated to analytics-speak, “she raises her game” can be interpreted as “she plays better than her rating implies.” That is certainly true for Osaka. She has won 16 of her 18 matches in the fourth round or later of a slam, often in matchups that didn’t appear to favor her. In her first title run, at the 2018 US Open, my Elo ratings gave her 36%, 53%, 46%, and 43% chances of winning her fourth-round, quarter-final, semi-final, and final-round matches, respectively.

Had Osaka performed at her expected level for each of her 18 second-week matches, we’d expect her to have won 10.7 of them. Instead, she won 16. The probability that she would have won 16 or more of the 18 matches is approximately 1 in 200. Either the model is selling her short, or she’s playing in a way that breaks the model.

Estimating lift

Osaka’s results in the second week of slams are vastly better than the other 93% or so of her tour-level career. It’s possible that it’s entirely down to luck–after all, things with a 0.5% chance of happening have a habit of occurring about 0.5% of the time, not never. When those rare events do take place, onlookers are very resourceful when it comes to explaining them. You might believe Osaka’s claims about caring more on the big stage, but we should keep in mind that whenever the unlikely happens, a plausible justification often follows.

Recognizing the slim possibility that Osaka has taken advantage of some epic good luck but setting it aside, let’s quantify how good she’d have to be for such a performance to not look lucky at all.

That’s a mouthful, so let me explain. Going into her 16 second-week slam matches, Osaka’s average surface-blended Elos have been 2,022. That’s good but not great–it’s a tick below Aryna Sabalenka’s hard-court Elo rating right now. Those modest ratings are how we come up with the estimate that Osaka should’ve won 10.7 of her 18 matches, and that she had a 1-in-200 shot of winning 16 or more.

2,022 doesn’t explain Osaka’s success, so the question is: What number does? We could retroactively boost her Elo rating before each of those matches by some amount so that her chance of winning 16-plus out of 18 would be a more believable 50%. What’s that boost? I used a similar methodology a couple of years ago to quantify Rafael Nadal’s feats at his best clay court events, another string of match wins that Elo can’t quite explain.

The answer is 280 Elo rating points. If we retroactively gave Osaka an extra 280 points before each of these 16 matches, the resulting match forecasts would mean that she’d have had a fifty-fifty chance at winning 14 or more of them. Instead of a pre-match average of 2,022, we’re looking at about 2,300, considerably better than anyone on tour right now. (And, ho hum, among the best of all time.) A difference of 280 Elo points is enormous–it’s the difference between #1 and #22 in the current hard-court Elo rating.

Osaka versus the greats

I said before that Osaka’s 12-0 is unprecedented. Her 16-2 in slam second weeks may not have quite the same ring to it, but compared to expectations based on Osaka’s overall tour-level performance, it is every bit as unusual.

Take Serena Williams, another woman who cranks it up a notch when it really matters. Her second-week record, excluding retirements, is 149-39, while the individual forecasts before each match would’ve predicted about 124-64. The chances of a player outperforming expectations to that extent are basically zero. I ran 10,000 simulations, and that’s how many times a player with Serena’s pre-match odds won 147 of the 185 matches. Zero.

For Serena to have had a 50% chance of winning 149 of the 188 second-week contests, her pre-match Elo ratings would’ve had to have been 140 points higher. That’s a big difference, especially on top of the already stellar ratings that she has maintained throughout her career, but it’s only half of the jump we needed to account for Osaka’s exploits. Setting aside the possibility of luck, Osaka raises her level twice as much as Serena does.

One more example. Monica Seles won 70 of her 95 second-week matches at slams, a marked outperformance of the 60 matches that Elo would’ve predicted for her. Like Osaka, her chances of having won 70 instead of 60 based purely on luck are about 1 in 100. But you can account for her actual results by giving her a pre-match Elo bonus of “only” 100 points.

The full context

I ran similar calculations for the 52 women who won a slam, made their first second-week appearance in 1958 or later, and played at least 10 second-week matches. They divide fairly neatly into three groups. 18 of them have career second-week performances that can easily be explained without recourse to good luck or level-raising. In some cases we can even say that they were unlucky or that they performed worse than expected. Ashleigh Barty is one of them: Of her 14 second-week matches, she was expected to win 9.9 but has tallied only 8.

Another 16 have been a bit lucky or slightly raised their level. To use the terms I introduced above, their performances can be accounted for by upping their pre-match Elo ratings by between 10 and 60 points. One example is Venus Williams, who has gone 84-43 in slam second weeks, about six wins better than her pre-match forecasts would’ve predicted.

That leaves 18 players whose second-week performances range from “better than expected” to “holy crap.” I’ve listed each of them below, with their actual wins (“W”), forecasted wins (“eW”), probability of winning their actual total given pre-match forecasts (“p(W)”), and the approximate number of Elo points (“Elo+”) which, when added to their pre-match forecasts, would explain their results by shifting p(W) up to at least 50%.

Player               M    W     eW   p(W)  Elo+  
Naomi Osaka         18   16   10.7   0.5%   280  
Billie Jean King   123   94   76.2   0.0%   160  
Sofia Kenin         10    7    4.7  10.6%   150  
Serena Williams    188  149  124.4   0.0%   140  
Evonne Goolagong    92   69   58.7   0.4%   130  
Jennifer Capriati   70   42   33.2   1.2%   110  
Monica Seles        95   70   60.2   1.2%   100  
Hana Mandlikova     75   49   41.7   3.1%   100  
Kim Clijsters       67   47   40.6   4.6%    90  
Justine Henin       74   55   48.9   6.3%    80  
Mary Pierce         55   28   22.4   6.9%    80  
Li Na               36   22   18.0  10.6%    80  
Steffi Graf        157  131  123.6   6.1%    70  
Maria Bueno         93   70   63.4   6.3%    70  
Garbine Muguruza    31   18   14.9  15.8%    70  
Mima Jausovec       32   18   15.0  15.9%    70  
Marion Bartoli      20   11    8.8  20.6%    70  
Sloane Stephens     24   12    9.7  20.8%    70

There are plenty of names here that we’d comfortably put alongside Williams and Seles as luminaries known for their clutch performances. Still, the difference between Osaka’s levels is on another planet.

Obligatory caveats

Again, of course, Osaka’s results could just be lucky. It doesn’t look that way when she plays, and the qualitative explanations add up, but … it’s possible.

Skeptics might also focus on the breakdown of the 52-player sample. In terms of second-week performance relative to forecasts, only one-third of the players were below average. That doesn’t seem quite right. The “average” woman outperformed expectations by about 30 Elo points.

There are two reasons for that. The first is that my sample is, by definition, made up of slam winners. Those players won at least four second-week matches, no matter how they fared in the rest of their careers. In other words, it’s a non-random sample. But that doesn’t have any relevance to Osaka’s case.

The second, more applicable, reason that more than half of the players look like outperformers is that any pre-match player rating is a measure of the past. Elo isn’t as much of a lagging indicator as, say, official tour rankings, but by its nature, it can only consider past results.

Any player who ascends to the top of the game will, at some point, need to exceed expectations. (If you don’t exceed expectations, you end up with a tennis “career” like mine.) To go from mid-pack to slam winner, you’ll have at least one major where you defy the forecasts, as Osaka did in New York in 2018. Osaka was an extreme case, because she hadn’t done much outside of the slams. If, for instance, Sabalenka were to win the US Open this year, she has done so well elsewhere that it wouldn’t be the same kind of shock, but it would still be a bit of a surprise.

In other words, almost every player to win a slam had at least one or two majors where they executed better than their previous results offered any reason to expect. That’s one reason why we find Sofia Kenin only two spots below Osaka on the list.

For Serena or Seles, the “rising star” effect doesn’t make much of a difference–those early tournaments are just a drop in the bucket of a long career. Yeah, it might mean they really only up their game by 110 Elo points instead of 130, but it doesn’t call their entire career’s worth of results into question. For Osaka or Kenin, the early results make up a big part of the sample, so this is something to consider.

It will be tougher to Osaka to outperform expectations as the expectations continue to rise. Much depends on whether she continues to struggle away from the big stages. If she continues to manage only one non-major title per year, she’ll keep her rating down and suppress those pre-match forecasts. (The predictions of major media pundits will be harder to keep under control.) Beating the forecasts isn’t necessarily something to aspire to–even though Serena does it, her usual level is so high that we barely notice. But if Osaka is going to alternate levels between world-class and merely very good, she could hardly do better than to bring out her best stuff when she does.

The Post-Covid Tennis World is Unpredictable. The Match Results Are Not.

Both the ATP and WTA patched together seasons in the second half of 2020, providing playing opportunities to competitors who had endured vastly different lockdowns–some who couldn’t practice for awhile, some who came down with Covid-19, and others who got knee surgery.

When the tours came back, we didn’t know quite what to expect. I’m sure some of the players didn’t know, either. Yet when we take the 2020 season (plus a couple weeks of 2021) as a whole, what happened on court was pretty much what happened before. The Australian Open, with its dozens of players in hard quarantine for two weeks, may change that. But for about five months, players faced all kinds of other unfamiliar challenges, and they responded by posting results that wouldn’t have looked out of place in January 2020.

The Brier end

My usual metric for “predictability” is Brier Score, which measures both accuracy (did our pre-match favorite win?) and confidence (if we think four players are all 75% favorites, did three of them win?). Pre-match odds are determined by my Elo ratings, which are far from the final word, but are more than sufficient for these purposes. My tour-wide Brier Scores are usually in the neighborhood of 0.21, several steps better than the 0.25 Brier that results from pure coin-flipping. A lower score indicates more accurate forecasts and/or better calibrated confidence levels.

Here are the tour-wide Brier Scores for the ATP and WTA since the late-summer restart:

  • ATP: 0.213 (2017 – early 2020: 0.212)
  • WTA: 0.192 (2017 – early 2020: 0.212)

The ATP’s level of predictability is so steady that it’s almost suspicious, while the WTA has somehow been more predictable since the restart.

But we aren’t quite comparing apples to apples. The post-restart WTA was sparser than the pre-Covid women’s tour, and the post-restart ATP was closer to its pre-pandemic normal.

Let’s look at a few things that do line up. Most of the top players showed up for the main events of the restarted tour, such as the US Open, Roland Garros, Rome, “Cincinnati” (played in New York), and men’s Masters event in Paris. Here are the 2019 and 2020 Brier Scores for each of those events:

Event          Men '19  Men '20  Women '19  Women '20  
Cincinnati       0.244    0.210      0.244      0.252  
US Open          0.210    0.167      0.178      0.186  
Roland Garros    0.163    0.199      0.191      0.226  
Rome             0.209    0.274      0.205      0.232  
Paris            0.226    0.199          -          -  
---
Total            0.204    0.202      0.198      0.218

(If you want even more numbers, I did similar calculations in August after Palermo, Lexington, and Prague.)

Three takeaways from this exercise:

  • Brier Scores are noisy. Any single tournament number can be heavily affected by a few major upsets.
  • Man, those ATP dudes were steady.
  • The WTA situation is more complicated than I thought.

Whether we look at the entire post-restart tour or solely the big events, the story on the ATP side is clear. Long layoffs, tournament bubbles, missing towelkids, Hawkeye Live … none of it had much effect on the status quo.

The predictability of the women’s tour is another thing entirely. The 12 top-level events between Palermo in July and Abu Dhabi in January were easier to forecast than a random sampling of a dozen tournaments from, say, 2018. But the four biggest events deviated from the script considerably more than they had in 2019 (or 2017 or 2018, for that matter).

From this, I offer a few tentative conclusions:

  • Big events, with their disproportionate number of star-versus-star matches, are a bit more predictable than other tournaments.
  • Accordingly, the post-restart WTA wasn’t as predictable as it first appeared. It was just lopsided in favor of tournaments that drew (most of) the top stars. Had the women’s tour featured a wider variety of events–which probably would’ve included a larger group of players, including some fringier ones–it’s post-restart Brier Score would’ve been higher. Perhaps even higher than the corresponding pre-Covid number.
  • Most tentative of all: The predictability of ATP and WTA match results might have itself been affected by the availability of tournaments. Top men were able to get into something like their usual groove, despite the weirdness of virus testing and empty stadiums. Most women never got a chance to play more than two or three weeks in a row.

Even six months after Palermo, the data is still limited. And by the time we have enough match results to do proper comparisons, some things will have gotten back to normal (hopefully!), complicating the analysis even further. That said, these findings are much clearer than my initial forays into post-restart Brier Scores in August. As for the Australian Open, quarantine and all, I’m forecasting a predictable tournament. At least for the men.

Not All Twenties Are Created Equal

The top of the all-time men’s grand slam ranking just got even more crowded. With his 13th Roland Garros title, Rafael Nadal has matched Roger Federer at the top of the list by securing his 20th major title. Novak Djokovic, Nadal’s final obstacle en route to the historic mark, remains within shouting distance with 17 slams.

The Roger-Rafa tie has spurred another (interminable, unresolvable) round of the (interminable, unresolvable) GOAT debate. Of course there’s much more to determining the best ever than the slam count. But the slam count is a big part of the conversation. If we’re going to keep doing this, we ought to at least recognize that not all major titles are created equal. And by extension, not all collections of twenty major titles are equivalent.

We all have intuitions about the difficulty of how a particular draw shakes out, with its typical mix of good and bad fortune. Nadal was lucky that he missed a few dangerous opponents in the early rounds, luckier still that he didn’t have to face Dominic Thiem in the semi-final, and unfortunate that he had to face down the next-best player in the draw, Djokovic, in the final. As it turned out, it didn’t really matter, but I think most of us would agree that Nadal’s achievement–staggering as it is–would look even better had he faced more than two more players ranked in the top 70.

Stop dithering and start calculating

I’ve written about this before, and I’ve established a metric to quantify those intuitions. Take the surface-weighted Elo rating of each of a player’s opponents, and determine the probability that an average slam champion would beat those players. After a couple of steps to normalize the results, we end up with a single number for the path to each slam title. The larger the result, the more difficult the path, and an average slam works out to 1.0.

Nadal’s path was easier than the historical average. Aside from Djokovic, none of his opponents would have had more than an 8% chance of knocking out an average slam champion on clay. The exact result is 0.64, which is easier than almost nine-tenths of majors in the Open Era. Rafa has had three easier paths to his major titles, including the 2017 US Open, which scored only 0.33. That’s the easiest US Open, Wimbledon, or Roland Garros in a half-century.

Of course, he’s had his share of difficult paths, such as 2012 Roland Garros (1.36), when he faced several clay specialists and a peak-level Djokovic. Federer and Djokovic have gotten their own shares of lucky and unlucky draws over the years–that’s why we need a metric. You might have a better memory for this kind of thing than I do, but I don’t think any of us can weigh 57 majors with 7 opponents each and work out any meaningful results in our heads.

The tally

Sum up the difficulty of the title paths for these 57 slams, and here are the results:

Player    Slams  Avg Score  Total  
Nadal        20       0.95   19.0  
Djokovic     17       1.06   18.1  
Federer      20       0.89   17.9  
                                   
Player     Easy     Medium   Hard  
Nadal         7          8      5  
Djokovic      5          5      7  
Federer       9         10      1

The first table shows each player’s average score for the paths to his major titles, and the total number of “adjusted slams” that gives them. Nadal is in the lead with 19, and Djokovic and Federer follow in a near-tie, just above and below 18.

You might be surprised to see the implication that this is a slightly weak era, with average scores a bit below 1.0. That wasn’t the case a few years ago, but there has only been one above-average title path since 2016. The Big Three-or-Four has generally stayed out of each other’s way since then, and even when they do clash, as they did yesterday, the leading contenders for quarter-final or semi-final challenges failed to make it that far. The average score of the last 15 slam title paths is a mere 0.73, while the 16 before that (spanning 2013-16) averaged 1.20.

The second table paints with a broader brush, classifying all Open Era slam titles into thirds: “easy,” “medium” and “hard” paths to the championship. Anything below 0.89 rates as “easy,” anything above 1.14 is marked as “hard,” with the remainder left as “medium.”

Djokovic is the leader in hard slams, with 7 of his 17 meriting that classification. Federer has racked up 10 medium slams, including several that score above 1.0, but only one that cleared the bar for the “hard” category. Nadal’s mix is more balanced.

Go yell at someone else

Hopefully these numbers have given you some new ammunition for your next twitter fight. Some of you will froth at the mouth while insisting that players can’t control who they play. You’re right, but it doesn’t really matter. We can’t start giving out GOAT points for things that players didn’t do, like beat Thiem in the 2020 French Open semi-finals. All three of these guys were or are good enough at various points to have beaten some of the opponents they didn’t have to face. There are other approaches we could take to the GOAT debate that incorporate peak Elo ratings and longevity at various levels, but that’s not what we’re talking about when we count slams.

If we are going to focus so much on the slam count, we might as well acknowledge that Nadal’s 20 is better than Federer’s 20, and Djokovic’s 17 is awfully close to both of them.

The Post-Covid WTA is Drifting Back to Normal

In the two latest WTA events, we saw a mix of the expected and the unusual. Simona Halep, the heavy favorite in Prague, wound up with the title despite a couple of demanding three-setters in her first two rounds. The week’s other tournament, in Lexington, failed to follow the script. Serena Williams and Aryna Sabalenka, the big hitters at the top and bottom of the bracket, combined for three wins, with four unseeded players making up the semi-final field.

Last week I pointed out that Palermo–the tour’s initial comeback event–was so unpredictable that you would’ve been better off to treat each match as a coin flip than to use pre-layoff player strength ratings (such as Elo) to forecast outcomes. Such an upset-ridden event isn’t unheard of, even in pandemic-free times, but it is suggestive that the WTA rank-and-file haven’t quite returned to their usual form.

Prague and Lexington give us three times as much data to work with. Plus, we might theorize that Prague would be a little more predictable because so many players in that field also took part in the Palermo event, meaning that they have a little more recent match experience. While our sample of 93 main draw matches is still flimsy, it brings us a little closer to understanding how well traditional forecasts will handle this unusual time.

A thorny Brier patch

The metric I’m using to quantify predictability–or to put it another way, the validity of pre-layoff player ratings–is Brier Score, which takes into account both raw accuracy (did the forecast pick the right player to win?) and confidence level (was the forecast too strong, too weak, or just right?). Tour-level Brier Scores are usually in the range of 0.21, while a score of 0.25 means the predictions were no better than coin flips. A lower score represents more accurate predictions.

Here are the Brier Scores for Palermo, Lexington, and Prague, along with the average of the three, and the average of all WTA International events (on all surfaces) since 2017. (The scores are based on forecasts generated from my Elo ratings.) We might expect the first round to be different, since players are particularly rusty at that stage, so I’ve also broken out first round (“R32 Brier”) matches for each of the tournaments and averages in the table.

Tournament    Brier  R32 Brier  
Palermo       0.268      0.295  
Lexington     0.226      0.170  
Prague        0.212      0.247  
Comeback Avg  0.235      0.237  
Intl Avg      0.217      0.213

As we last week, the Palermo results truly defied expectations. More than half of the matches were upsets (according to my Elo ratings), with a particularly unpredictable first round.

That didn’t last. The Prague first round rated 0.247–just barely better than coin flips–but the messiness didn’t last beyond the first couple of days. The event’s overall Brier Score was 0.212, slightly better than the average WTA International. In other words, this group of 32 women, only recently returned from a months-long break, delivered results that were roughly as predictable as we would expect in the middle of a normal season.

The Lexington numbers are a bit more difficult to make sense of, but like Prague’s, they point to a post-coronavirus world that isn’t all that weird. The opening round closely followed the script, with a Brier Score of 0.170. Of the last 115 WTA International events, only 22 were more predictable. The forecast accuracy didn’t last, in large part because of Serena’s loss at the hands of Shelby Rogers. The rating for the entire tournament was 0.226, less predictable than usual, but much better than random guessing and closer to tour average than to the assumption-questioning Palermo numbers.

Revised estimates

We’re still early in the process of evaluating what to expect from players after the COVID-19 layoff. As more tournaments take place, we can identify whether players become more predictable with more matches under their belts. (Perhaps the Prague participants who skipped Palermo were more difficult to forecast, although Halep is an obvious counterexample.)

At this point, anything is possible. It could be that we will steadily drift back to business is usual. On the other hand, the new social-distancing-oriented rules–with few or no fans on site, nightlife limited to Netflix, players fetching their own towels, and new variations of on-court coaching–might work to the advantage of some women and the disadvantage of others. If that’s the case, Elo ratings will go through a novel period of adjustment as they shift to reflect which players thrive on the post-corona tour.

It’s too early to do much more than speculate about something as significant as that. But in the last week, we’ve seen forecasts go from wildly wrong (in Palermo) to not half bad (in Lexington and Prague). We’ve gained some confidence that for all the things that have obviously changed since March, our approach to player ratings may be one thing that largely remains the same.

Did Palermo Show the Signs of a Five-Month Pandemic Layoff?

Are tennis players tougher to predict when they haven’t played an official match for almost half a year? Last week’s WTA return-to-(sort-of)-normal in Palermo gave us a glimpse into that question. In a post last week I speculated that results would be tougher than usual to forecast for awhile, necessitating some tweaks to my Elo algorithm. The 31 main draw matches from Sicily allow us to run some preliminary tests.

At first glance, the results look a bit surprising. Only two of the eight seeds reached the semifinals, and the ultimate champion was the unseeded Fiona Ferro. Two wild cards reached the quarters. Is that notably weird for a WTA International-level event? It doesn’t seem that strange, so let’s establish a baseline.

Palermo the unpredictable

My go-to metric for “predictability” is Brier Score, which measures the accuracy of percentage forecasts. It’s nice to pick the winner, but it’s more important to assign the right level of probability. If you say that 100 matches are all 60/40 propositions, your favorites should win 60 of the 100 matches. If they win 90, you weren’t nearly confident enough; if they win 50, you would’ve been better off flipping a coin. Brier Score encapsulates those notions into a single number, the lower the better. Roughly speaking, my Elo forecasts for ATP and WTA matches hover a bit above 0.2.

From 2017 through March 2020, the 975 completed matches at clay-court WTA International events had a collective Brier Score of 0.223. First round matches were a tiny bit more predictable, with R32’s scoring 0.219.

Palermo was a roller-coaster by comparison. The 31 main-draw matches combined for a Brier Score of 0.268. Of the 32 other events I considered, only last year’s Prague tourney was higher, generating a 0.277 mark.

The first round was more unpredictable still, at 0.295. On the other hand, the combination of a smaller per-event sample and the wide variety of first-round fields means that several tournaments were wilder for the first few days. 9 of the 32 others had a first-round Brier Score above 0.250, with four of them scoring higher–that is, worse–than Palermo did.

The Brier Score of shame

I mentioned the 0.250 mark because it is a sort of Brier Score of shame. Let’s say you’re predicting the outcome of a series of coin flips. The smart pick is 50/50 every time. It’s boring, but forecasting something more extreme just means you’re even more wrong half the time. If you set your forecast at 50% for a series of random events with a 50/50 chance of occurring, your Brier Score will be … 0.250.

Another way to put it is this: If your Brier Score is higher than 0.250, you would’ve been better off predicting that every match was 50/50. All the fancy forecasting went to waste.

In Palermo, 17 of the 31 matches went the way of the underdog, at least according to my Elo formula. The Brier Scores were on the shameful side of the line. My earlier post–which advocated moderating all forecasts, at least a bit–didn’t go far enough. At least so far, the best course would’ve been to scrap the algorithm entirely and start flipping that coin.

Moderating the moderation

All that said, I’m not quite ready to throw away my Elo ratings. (At the moment, they pick Simona Halep and Aryna Sabalenka, my two favorite players, to win in Prague in Lexington. So there’s that.) 31 matches is small sample, far from adequate to judge the accuracy of a system designed to predict the outcome of thousands of matches each year. As I mentioned above, Elo failed even worse at Prague last year, but because that tournament didn’t follow several months of global shutdowns, it wouldn’t have even occurred to me to treat it as more than a blip.

This time, a week full of forecast-busting surprises could well be more than a blip. Treating players as if they have exactly the abilities they had in March is probably the wrong way to do things, and it could be a very wrong way of doing things. We’ll triple the size our sample in the next week, and expand it even more over the next month. It won’t help us pick winners right now, but soon we’ll have a better idea of just how unpredictable the post-COVID-19 tennis world really is.