You’ve probably heard the stat by now. When Naomi Osaka reaches the quarter-final of a major, she’s 12-0. That’s unprecedented, and it’s especially unexpected from a player who doesn’t exactly pile up hardware outside of the hard court grand slams.
It sure looks like Osaka finds another level as she approaches the business end of a major. Translated to analytics-speak, “she raises her game” can be interpreted as “she plays better than her rating implies.” That is certainly true for Osaka. She has won 16 of her 18 matches in the fourth round or later of a slam, often in matchups that didn’t appear to favor her. In her first title run, at the 2018 US Open, my Elo ratings gave her 36%, 53%, 46%, and 43% chances of winning her fourth-round, quarter-final, semi-final, and final-round matches, respectively.
Had Osaka performed at her expected level for each of her 18 second-week matches, we’d expect her to have won 10.7 of them. Instead, she won 16. The probability that she would have won 16 or more of the 18 matches is approximately 1 in 200. Either the model is selling her short, or she’s playing in a way that breaks the model.
Estimating lift
Osaka’s results in the second week of slams are vastly better than the other 93% or so of her tour-level career. It’s possible that it’s entirely down to luck–after all, things with a 0.5% chance of happening have a habit of occurring about 0.5% of the time, not never. When those rare events do take place, onlookers are very resourceful when it comes to explaining them. You might believe Osaka’s claims about caring more on the big stage, but we should keep in mind that whenever the unlikely happens, a plausible justification often follows.
Recognizing the slim possibility that Osaka has taken advantage of some epic good luck but setting it aside, let’s quantify how good she’d have to be for such a performance to not look lucky at all.
That’s a mouthful, so let me explain. Going into her 16 second-week slam matches, Osaka’s average surface-blended Elos have been 2,022. That’s good but not great–it’s a tick below Aryna Sabalenka’s hard-court Elo rating right now. Those modest ratings are how we come up with the estimate that Osaka should’ve won 10.7 of her 18 matches, and that she had a 1-in-200 shot of winning 16 or more.
2,022 doesn’t explain Osaka’s success, so the question is: What number does? We could retroactively boost her Elo rating before each of those matches by some amount so that her chance of winning 16-plus out of 18 would be a more believable 50%. What’s that boost? I used a similar methodology a couple of years ago to quantify Rafael Nadal’s feats at his best clay court events, another string of match wins that Elo can’t quite explain.
The answer is 280 Elo rating points. If we retroactively gave Osaka an extra 280 points before each of these 16 matches, the resulting match forecasts would mean that she’d have had a fifty-fifty chance at winning 14 or more of them. Instead of a pre-match average of 2,022, we’re looking at about 2,300, considerably better than anyone on tour right now. (And, ho hum, among the best of all time.) A difference of 280 Elo points is enormous–it’s the difference between #1 and #22 in the current hard-court Elo rating.
Osaka versus the greats
I said before that Osaka’s 12-0 is unprecedented. Her 16-2 in slam second weeks may not have quite the same ring to it, but compared to expectations based on Osaka’s overall tour-level performance, it is every bit as unusual.
Take Serena Williams, another woman who cranks it up a notch when it really matters. Her second-week record, excluding retirements, is 149-39, while the individual forecasts before each match would’ve predicted about 124-64. The chances of a player outperforming expectations to that extent are basically zero. I ran 10,000 simulations, and that’s how many times a player with Serena’s pre-match odds won 147 of the 185 matches. Zero.
For Serena to have had a 50% chance of winning 149 of the 188 second-week contests, her pre-match Elo ratings would’ve had to have been 140 points higher. That’s a big difference, especially on top of the already stellar ratings that she has maintained throughout her career, but it’s only half of the jump we needed to account for Osaka’s exploits. Setting aside the possibility of luck, Osaka raises her level twice as much as Serena does.
One more example. Monica Seles won 70 of her 95 second-week matches at slams, a marked outperformance of the 60 matches that Elo would’ve predicted for her. Like Osaka, her chances of having won 70 instead of 60 based purely on luck are about 1 in 100. But you can account for her actual results by giving her a pre-match Elo bonus of “only” 100 points.
The full context
I ran similar calculations for the 52 women who won a slam, made their first second-week appearance in 1958 or later, and played at least 10 second-week matches. They divide fairly neatly into three groups. 18 of them have career second-week performances that can easily be explained without recourse to good luck or level-raising. In some cases we can even say that they were unlucky or that they performed worse than expected. Ashleigh Barty is one of them: Of her 14 second-week matches, she was expected to win 9.9 but has tallied only 8.
Another 16 have been a bit lucky or slightly raised their level. To use the terms I introduced above, their performances can be accounted for by upping their pre-match Elo ratings by between 10 and 60 points. One example is Venus Williams, who has gone 84-43 in slam second weeks, about six wins better than her pre-match forecasts would’ve predicted.
That leaves 18 players whose second-week performances range from “better than expected” to “holy crap.” I’ve listed each of them below, with their actual wins (“W”), forecasted wins (“eW”), probability of winning their actual total given pre-match forecasts (“p(W)”), and the approximate number of Elo points (“Elo+”) which, when added to their pre-match forecasts, would explain their results by shifting p(W) up to at least 50%.
Player M W eW p(W) Elo+ Naomi Osaka 18 16 10.7 0.5% 280 Billie Jean King 123 94 76.2 0.0% 160 Sofia Kenin 10 7 4.7 10.6% 150 Serena Williams 188 149 124.4 0.0% 140 Evonne Goolagong 92 69 58.7 0.4% 130 Jennifer Capriati 70 42 33.2 1.2% 110 Monica Seles 95 70 60.2 1.2% 100 Hana Mandlikova 75 49 41.7 3.1% 100 Kim Clijsters 67 47 40.6 4.6% 90 Justine Henin 74 55 48.9 6.3% 80 Mary Pierce 55 28 22.4 6.9% 80 Li Na 36 22 18.0 10.6% 80 Steffi Graf 157 131 123.6 6.1% 70 Maria Bueno 93 70 63.4 6.3% 70 Garbine Muguruza 31 18 14.9 15.8% 70 Mima Jausovec 32 18 15.0 15.9% 70 Marion Bartoli 20 11 8.8 20.6% 70 Sloane Stephens 24 12 9.7 20.8% 70
There are plenty of names here that we’d comfortably put alongside Williams and Seles as luminaries known for their clutch performances. Still, the difference between Osaka’s levels is on another planet.
Obligatory caveats
Again, of course, Osaka’s results could just be lucky. It doesn’t look that way when she plays, and the qualitative explanations add up, but … it’s possible.
Skeptics might also focus on the breakdown of the 52-player sample. In terms of second-week performance relative to forecasts, only one-third of the players were below average. That doesn’t seem quite right. The “average” woman outperformed expectations by about 30 Elo points.
There are two reasons for that. The first is that my sample is, by definition, made up of slam winners. Those players won at least four second-week matches, no matter how they fared in the rest of their careers. In other words, it’s a non-random sample. But that doesn’t have any relevance to Osaka’s case.
The second, more applicable, reason that more than half of the players look like outperformers is that any pre-match player rating is a measure of the past. Elo isn’t as much of a lagging indicator as, say, official tour rankings, but by its nature, it can only consider past results.
Any player who ascends to the top of the game will, at some point, need to exceed expectations. (If you don’t exceed expectations, you end up with a tennis “career” like mine.) To go from mid-pack to slam winner, you’ll have at least one major where you defy the forecasts, as Osaka did in New York in 2018. Osaka was an extreme case, because she hadn’t done much outside of the slams. If, for instance, Sabalenka were to win the US Open this year, she has done so well elsewhere that it wouldn’t be the same kind of shock, but it would still be a bit of a surprise.
In other words, almost every player to win a slam had at least one or two majors where they executed better than their previous results offered any reason to expect. That’s one reason why we find Sofia Kenin only two spots below Osaka on the list.
For Serena or Seles, the “rising star” effect doesn’t make much of a difference–those early tournaments are just a drop in the bucket of a long career. Yeah, it might mean they really only up their game by 110 Elo points instead of 130, but it doesn’t call their entire career’s worth of results into question. For Osaka or Kenin, the early results make up a big part of the sample, so this is something to consider.
It will be tougher to Osaka to outperform expectations as the expectations continue to rise. Much depends on whether she continues to struggle away from the big stages. If she continues to manage only one non-major title per year, she’ll keep her rating down and suppress those pre-match forecasts. (The predictions of major media pundits will be harder to keep under control.) Beating the forecasts isn’t necessarily something to aspire to–even though Serena does it, her usual level is so high that we barely notice. But if Osaka is going to alternate levels between world-class and merely very good, she could hardly do better than to bring out her best stuff when she does.
I’ll add that in order to counter the selection bias, what is usually done by data scientists, is splitting the data into training and test set. Then we can learn a model based on the train data (how many Elo points should be added based on past performance), and test its accuracy on the (by definition) unbiased test set.