Handling Injuries and Absences With Tennis Elo

Italian translation at settesei.it

For the last year or so, every mention of my ATP and WTA Elo ratings has required some sort of caveat. Ratings don’t change while players are absent from the tour, so Serena Williams, Novak Djokovic, Andy Murray, Maria Sharapova, and Victoria Azarenka were all stuck at the top of their tour’s Elo rankings. When their layoffs started, they were among the best, and even a smattering of poor results (or a near season’s worth, in the case of Sharapova) isn’t enough to knock them too far down the list.

This is contrary to common sense, and it’s very different from how the official ATP and WTA rankings treat these players. Common sense says that returning players probably aren’t as good as they were before a long break. The official rankings are harsher, removing players entirely after a full year away from the tour. Serena probably isn’t the best player on tour right now (as Elo insisted during her time off), but she’s also much more of a threat than her WTA ranking of No. 454 implies. We must be able to do better.

Before we fix the Elo algorithm, let’s take a moment to consider what “better” means. Fans tend to get worked up about rankings and seedings, as if a number confers value on the player. The official rankings are, by design, backward-looking: They measure players based on their performance over the last 52 weeks, weighted by how the tour prioritizes events. (They are used in a forward-looking way, for tournament seedings, but the system is not designed to be predictive of future results.) In this way, the official rankings say, “this is how good she has played for the last year.” Whatever her ability or potential, Serena (along with Vika, Murray, and Djokovic) hasn’t posted many positive results this year, and her ranking reflects that.

Elo, on the other hand, is designed to be predictive. Out of necessity, it can only use past results, but it uses those results in a way to best estimate how well a player is competing right now–our best proxy for how someone will play tomorrow, or next week. Elo ratings–even the naive ones that said Serena and Novak are your current No. 1s–are considerably better at predicting match outcomes than are the official rankings. For my purposes, that’s the definition of “better”–ratings that offer more accurate forecasts and, by extension, the best approximation of each player’s level right now.

The time-off penalty

When players leave the tour for very long, they return–at least on average, and at least temporarily–at a lower level. I identified every layoff of eight weeks or longer in ATP history, taken by a player with an Elo rating of 1900 or above*. In their first matches back on tour, their pre-break Elo overestimated their chances of winning by about 25%. It varies a bit by the amount of time off: eight- to ten-week breaks resulted in an overestimation around 17%, while 30- to 52-week breaks meant Elo overestimated a player’s chances by nearly 50% upon return. There are exceptions to every rule, like Roger Federer at the 2017 Australian Open, and Rafael Nadal, who won 14 matches in a row after his two-month break this season, but in general, players are worse when they come back.

* I used the cutoff of 1900 because, below that level, some players are alternating between the ATP and Challenger tours. My Elo algorithm doesn’t include challenger results, so for lower-rated players, it’s not clear which timespans are breaks, and which are series of challenger events. Also, the eight-week threshold doesn’t count the offseason, so an eight-week layoff might really mean ~16 weeks between events, with the break including the offseason.

Translated into Elo terms, an eight-week break results in a drop of 100 Elo points, and a not-quite-one-year break, like Andy Murray’s current injury layoff, means a drop of 150 points. Making that adjustment results in an immediate improvement in Elo’s predictiveness for the first match after a layoff, and a small improvement in predictiveness for the first 20 matches after a break.

Incorporating uncertainty

Elo is designed to always provide a “best estimate”–when a player is new on tour, we give him a provisional rating of 1500, and then adjust the rating after each match, depending on the result, the quality of the opponent, and how many matches our player has contested. That provisional 1500 is a completely ignorant guess, so the first adjustment is a big one. Over time, the size of a player’s Elo adjustments goes down, because we learn more about him. If a player loses his first-ever match to Joao Sousa, the only information we have is that he’s probably not as good as Sousa, so we subtract a lot of points. If Alexander Zverev loses to Sousa after more than 150 career matches, including dozens of wins over superior players, we’ll still dock Zverev a few points, but not as many, because we know so much more about him.

But after a layoff, we are a bit less certain that what we knew about a player is still relevant. Djokovic a great example right now. If he lost six out of nine matches (as he did between the Australian Open fourth round and Madrid) without missing any time beforehand, we’d know it was a slump, but most of us would expect him to snap out of it. Elo would reduce his rating, but he’d remain near the top. Since he missed the second half of last season, however, we’re more skeptical–perhaps he’ll never return to his former level. Other cases are even more clear-cut, as when a player returns from injury without being fully healed.

Thus, after a layoff, it makes sense to alter how much we adjust a player’s Elo ratings. This isn’t a new idea–it’s the core concept behind Glicko, another chess rating system that expands on Elo. Over the years, I’ve tinkered with Glicko quite a bit, looking for improvements that apply to tennis, without much success. Changing the multiplier that determines rating adjustments (known as the k factor) doesn’t improve the predictiveness of tennis Elo on its own, but combined with the post-layoff penalties I described above, it helps a bit.

The nitty-gritty: After a layoff, I increase the multiplier by a factor of 1.5, and then gradually reduce it back to 1x over the next 20 matches. The flexible multiplier slightly improves the accuracy of Elo ratings for those 20 matches, though the difference is minor compared to the effect of the initial penalty.

No more caveats*

* I thought it would be funny to put an asterisk after “no more caveats.”

Post-layoff penalties and flexible multipliers end up bringing down the current Elo ratings of the players who are in the middle of long breaks or have recently come back from them, giving us ranking tables that come closer to what we expect–and should do a better job of predicting the outcome of upcoming matches. These changes to the algorithm also have minor effects on the ratings of other players, because everyone’s rating depends on the rating of all of his or her opponents. So Taro Daniel’s Elo bounce from defeating Djokovic in Indian Wells doesn’t look quite as good as it did before I implemented the penalty.

On the ATP side, the new algorithm knocks Djokovic down to 3rd in overall Elo, Murray to 6th, Jo-Wilfried Tsonga to 21st, and Stan Wawrinka to 24th. That’s still quite high for Novak considering what we’ve seen this year, but remember that the Elo algorithm only knows about his on-court performances: A six-month break followed by a half-dozen disappointing losses. The overall effect is about a 200-point drop from his pre-layoff level; the “problem” is that his Elo a year ago reflected how jaw-droppingly good he had recently been.

The WTA results match my intuition even better than I hoped. Serena falls to 7th, Sharapova to 18th, and Azarenka to 23rd. Because of the flexible multiplier, a few early wins for Williams will send her quickly back up the rankings. Like Djokovic, she rates so high in part because of her stratospheric Elo rating before her time off. For her part, Sharapova still rates higher by Elo than she does in the official rankings. Despite the penalty for her one-year drug suspension, the algorithm still treats her prior success as relevant, even if that relevance fades a bit more every week.

Elo is always an approximation, and given the wide range of causes that will sideline a player, not to mention the spectrum of strategies for returning to the tour, any rating/forecasting system is going to have a harder time with players in that situation. That said, these improvements give us Elo ratings that do a better job of representing the current level of players who have missed time, and they will allow us to make superior predictions about matches and tournaments involving those players.

Under the hood

If you’re interested in some technical details, keep reading.

Before making these adjustments, the Brier score for Elo-based predictions of all ATP matches since 1972 was about 0.20. For all matches that involved at least one player with an Elo of 1900 or better, it was 0.17. (Not only are 1900+ players better, their ratings tend to be based on more data, which at least partly explains why the predictions are better. The lower the Brier score, the better.)

For the population of about 500 “first matches” after layoffs for qualifying players, the Brier score before these changes was 0.192. After implementing the penalty, it improved to 0.173.

For the 2nd through 20th post-comeback matches, the Brier score for the original algorithm was 0.195. After adding the penalty, it was 0.191, and after making the multiplier flexible, it fell a bit more to 0.190. (Additional increases to the post-layoff multiplier had negative results, pushing the Brier score back to about 0.195 when the 2nd-match multiplier was 2x.) I realize that’s a tiny change, and it very possibly won’t hold up in the future. But in looking at various notable players over the course of their comebacks, that’s the option that generated results that looked the most intuitively accurate. Since my intuition matched the best Brier score (however miniscule the difference), it seems like the best option.

Finally, a note on players with multiple layoffs. If someone misses six months, plays a few matches, then misses another two months, it doesn’t seem right to apply the penalty twice. There aren’t a lot of instances to use for testing, but the limited sample confirms this. My solution: If the second layoff is within two years of the previous comeback, combine the length of the two layoffs (here: eight months), find the penalty for a break of that length, and then apply the difference between that penalty and the previous one. Usually, that results in second-layoff penalties of between 10 and 50 points.

Discover more from Heavy Topspin

Subscribe now to keep reading and get access to the full archive.

Continue reading