Ivo Karlovic’s Survival and the Key to Aging in Men’s Tennis

Italian translation at settesei.it

Let’s just get this out of the way first: Ivo Karlovic is amazing. The Croatian didn’t play his first tour-level match until he was 22 years old, and he didn’t crack the top 100 for two years after that. Yet he eventually reached No. 14 in the world, won over 350 career matches, and claimed nine tour-level titles. Now, a few weeks shy of his 40th birthday, he’s coming off an ATP final in Pune, where he came within two points of ousting top-ten stalwart Kevin Anderson and ensured that he’ll remain in the top 100 through his milestone birthday next month.

The fact that Karlovic is one of the tallest men ever to play the game and that he holds a wide array of ace records is beside the point. (Though it’s certainly worthy of discussion, and I hope to dive into aging patterns and playing styles in a future post.) Yes, his first-strike brand of tennis, avoiding the bruising rallies that have worn down the likes of David Ferrer, may make it easier to compete at an advanced age. On the other hand, he remains of the few men on tour to regularly serve-and-volley, a tactic that scores of younger, quicker men can’t execute effectively. He is, quite simply, one of a kind.

Despite his uniqueness, Karlovic represents an important aspect of men’s tennis in the 2010s. The ATP has gotten older since he broke in almost two decades ago, and ten men aged at least 33 are ranked higher than the Croatian. One of them, 37-year-old Roger Federer, remains one of the best players in the game. The average age of elite men’s tennis may be creeping back down, but it is still the golden age of 30-somethings.

Men like Karlovic and Federer have seemed to defy the usual logic of aging. Most sports have a reliable “peak age” at which players can be expected to to perform their best. Up to that point, competitors are developing both physically and mentally; after the peak age, physical deterioration sets in and performance declines. There’s always plenty of variation around the average, but the overall trajectory–break in, rise, peak, fall, retire–is predictable enough.

In part, Karlovic has followed that path, just with a late start and a surprise second peak in his 30s. To compare year-to-year performances, I calculated each player’s dominance ratio (DR), a useful measure of overall performance calculated as the ratio of return points won to opponents’ return points won, and adjusted it for quality of competition. (The adjustment algorithm gets complicated; I first outlined how it controls for each player’s mix of opponents here.) 1.0 is average, and the typical range runs from about 0.8 (soon to head back to challengers) to 1.2 (big four territory). The following graph shows Ivo’s DR at each age, along with a smoother three-year moving average:

Karlovic hit his primary peak around age 31, a bit late but not entirely atypical for the era. Even if we ignore the surprise spike at age 36, he remained an average player (roughly speaking, a card-carrying member of the top 50) until age 35. In 2017 and 2018, we finally witnessed a downward trend, but if Ivo’s feat in Pune is any indication, he might be turning things around once again.

Nearly every professional tennis player retires before they reach Karlovic’s current age, so we’ll never know what bonus peaks we missed. Of course, many of those retirement decisions are due to injury, so at least some of the Croatian’s late-career success must be credited to his ability to stay healthy enough to soldier on. Let’s look at an even more baffling aging pattern, one that belongs to a player who will almost definitely retire before seeing the kind of late-career decline that Karlovic experienced in 2017 and 2018. Here’s Federer:

By the measure of competition-adjusted DR, Federer’s best season came at age 34. Even if you don’t buy that, the overall trend is clear. He continues to play at or near his peak, past the age at which his peers become Davis Cup captains and have Tour Finals round-robin groups named after them.

Federer has been able to stay off the injured list for almost all of his 20 years on tour, and health–the simple fact of showing up for most tournaments–may be the most underrated skill in men’s tennis. The vast majority of players who don’t survive to post elite seasons in their mid- and late-30s aren’t slowly drifting down the ranking list, like a baseball player who plays every game in his 20s, then moves into more and more limited part time roles as he ages. Instead, they drop out, perhaps because of a single career-ending injury, the general accumulation of nagging problems, or lack of desire to wholeheartedly pursue the sport at the expense of everything else.

The following graph shows the two ways in which players fail to maintain their previous level from one year to the next: Playing worse tennis (measured by competition-adjusted DR), or leaving the tour. The latter is defined by contesting fewer than 20 tour-level matches, something that any reasonably healthy player with a ranking in the top 100 should be able to manage. At every age, players drop out at a surprising clip, and that rate begins to overtake the percentage of players who stay on tour but perform at a weaker level around the late 20s:

The “Leave tour” rates slightly overstate the number of disappearing players, since about one-quarter of them eventually return to the tour, like Andy Murray is trying to do in 2019. But even accounting for the number of comebacks, a hefty share of the players we expect to steadily decline are either forced off tour by injury or choose not to continue.

Selection bias

All of these disappearing players make it extremely difficult to construct an aging curve for men’s tennis. One common approach to measuring such a trajectory is to identify all the players who competed in consecutive seasons (say, their age-25 and age-26 campaigns), figure out how much better or worse they performed in the latter year, and average the differences. When we do that for ATP players born since 1970, the results are downright bizarre. The worst year-to-year change is from age 21 to age 22, when DR decreases by about 2.3%, even though we would expect youngsters to be developing their game for the better. The strongest year-to-year change is from age 30 to age 31, with an improvement of 4.0%, when we would expect a plateau or even a slight decline.

Because these ratios don’t include the players who drop out, most of the year-to-year ratios reflect an improvement:

Age       Year-to-year DR ratio  
19 to 20                  -1.7%  
20 to 21                  +0.9%  
21 to 22                  -2.2%  
22 to 23                  -0.3%  
23 to 24                  +1.5%  
24 to 25                  +1.1%  
25 to 26                  +0.7%  
26 to 27                  +1.5%  
27 to 28                  +1.2%  
28 to 29                  +3.5%  
29 to 30                  -0.8%  
30 to 31                  +4.0%  
31 to 32                  +2.6%  
32 to 33                  +0.7%  
33 to 34                  -0.5%  
34 to 35                  +3.0%  
35 to 36                  -0.4%

If we assembled these ratios into an aging curve, we’d see a line staggering upwards, as if we could expect players to continue improving for as long as they cared to compete.

However, things start to make sense when we acknowledge the selection bias and reframe our findings accordingly. It isn’t true to say that the average player steadily improves forever. But it is more believable to say this: The average player who remains healthy enough to play a full season and has the desire to compete full-time can expect to improve well into his 30s. The older a player gets, the less likely that the second claim applies to him.

As they say, half of success is just showing up. By age 39, most pro tennis players have long since started showing up somewhere else. By dint of sheer perserverance, a bit of luck, and one of the most dominant serves the world has ever seen, Karlovic has shown that tennis’s aging curve is even more flexible that we thought.

Dominic Thiem In Pressure Service Games

Embed from Getty Images

Dominic Thiem has good reason to be frustrated.

Italian translation at settesei.it

On Tuesday night, Rafael Nadal and Dominic Thiem delivered the match of the 2018 US Open thus far. After nearly five hours of play, nothing separated them as they battled their way to 5-5 in a fifth-set tiebreak. Nadal finally crept ahead by the narrowest of margins, sealing a victory by the unlikely score of 0-6 6-4 7-5 6-7(4) 7-6(5).

Both players had plenty of chances, and while Rafa prepares for a semi-final against Juan Martin del Potro, Thiem will have plenty of time to mull over the opportunities he missed. In the second set, he failed to hold in both of his last two service games, including the final frame of the set, at 4-5. In the third set, he took the lead by breaking Nadal in the seventh game, but failed to follow up his advantage, losing serve when he attempted to serve it out at 5-4. Two games later, he proved unable to hold serve to stay in the set at 5-6, though he forced Rafa to four deuces before finally giving way.

These three missed chances are hardly the entire story of the match, but they stick out in memory. Overall, Thiem served quite well, allowing Nadal only one break per set. That’s 21 holds in 26 service games, an 81% hold rate, a significant achievement compared to the 66% that Nadal’s opponents have averaged against him on hard courts this year, or the paltry 52% that Rafa has allowed overall. The problem isn’t that the Austrian served badly–he didn’t–but that he weakened at the wrong times. Thiem broke Nadal more often than Rafa returned the favor–six to five–but because three of Thiem’s breaks came in the first, 6-0 set [editor’s note: !??!?!?!?] , Nadal’s six proved less costly than Thiem’s five.

Bad day, or just bad?

Is this something Thiem does, or is it just something that he did, perhaps nudged over the edge one of the greatest returners of all time? Too often, viewers–along with many of those paid to talk and write about tennis–see the latter and assume the former. Does Thiem make a habit of serving strong in lower-leverage games and then wilting when the pressure ratchets up?

If he does, it would make him an exception. I looked at “serving for the set” opportunities a few years ago and found that ATP players serve almost exactly as well when a hold would earn them the set than otherwise. The difference is a mere 0.7%, meaning that the “difficulty” of serving for the set translates into one additional break per 143 opportunities. The effect wasn’t any more noticeable when I narrowed the focus to situations in which the player led by only a single break, like Thiem’s dropped service game at 5-4 in the third set last night.

Let’s look again, and pay specific attention to Thiem. My dataset of sequential point-by-point data, spanning most ATP tour matches between late 2011 and a few weeks ago, now covers over 400,000 service games, including 30,000 serving-for-the-set chances, over two-thirds of them with a lead of a single break. Over 1% of them have Thiem serving, so at least our sample size benefits from the Austrian’s strenuous schedule, even if it doesn’t do him any favors on the court. In other words, we’ve got a ton of data here, so if there is an effect, we should be able to find it.

Thiem’s missed chances included chances to both finish a set and stay in a set, so I’ve expanded our view to a variety of pressure situations. For each situation, I’ve calculated the hold rate for players in that position relative to their typical hold rate in those matches. (A player with a lot of serve-to-stay-in opportunities is probably on the losing end, with a lower hold rate than average, but this method should control for that.) A ratio of 1.0 means that the hold rate in the pressure situation is exactly the same as normal. A ratio above 1.0 means the hold rate is higher than usual, and below 1.0 signifies a lower hold rate–the lag many of us expect to see when the stakes get higher. Here are the ratios for a variety of situations, including serving for the set (plus a category one-break leads), serving to stay in the set (also with one-break deficits identified), ties late in the set such as 4-4 and 5-5, and for comparison’s sake, low-pressure situations–“All Else”–which is a catch-all for everything not in the above categories.*

* Yes, it includes the famous seventh game, which I’ve previously shown isn’t particularly important, no matter what Bill Tilden said.

Situation          Examples  Hold% / Avg  
For-Set            5-4; 5-2        0.994  
- For-Set Close    5-3; 6-5        0.989    
To-Stay            4-5; 1-5        0.999  
- To-Stay Close    5-6; 3-5        0.969    
Tied Late          4-4; 5-5        0.953  
All Else           2-3; etc        1.003

The “serving-for-the-set” effect is almost exactly same as what I found three years ago: a drop of a bit more than half a percent. Last year, the impact of serving for the set with a single break lead was a bit greater than I initially found, but it’s still small. We find servers struggling the most when serving to stay in the set while trailing by a a single break–losing serve 3.1% more often than usual–and when serving at 4-4 and 5-5, when they drop serve almost 5% more frequently than expected. These are the most substantial effects I’ve seen, but keep in mind the magnitude–even a 5% difference means it only flips the outcome of one service game in twenty. It certainly matters, but it would be awfully hard to spot with the naked eye.

The one percent

How does Thiem compare? Here is the same set of ratios for him, with separate columns for his career numbers (subject to the limitations of my dataset, which includes few matches before 2012) and for single-season figures from 2016, 2017, and 2018:

Situation        Career   2016   2017   2018  
For-Set           0.996  1.049  1.011  0.966  
- For-Set Close   0.984  1.078  1.008  0.887  
To-Stay           1.030  1.160  1.027  0.940  
- To-Stay Close   0.984  1.148  0.957  0.964  
Tied Late         0.984  0.976  0.991  0.889  
All Else          1.004  0.994  1.009  1.030

Thiem’s career numbers reveal little, just a player who is a tiny bit worse in high-leverage situations, though perhaps a little less affected by the pressure than his peers. The concern is his numbers so far this year, which are way down across the board. Each one of the categories represents a relatively small sample–for example, I have only 42 games in which he was serving for the set with a single break advantage–but taken together, the set of sub-1.0 ratios don’t point in an encouraging direction. We could never have forecast before last night’s match that Thiem would serve so well in general but so much weaker in the clutch, but there were subtle hints lurking in his 2018 performance.

A puzzle

I want to show you the same set of data, but for another player. In one way, it’s the opposite of Thiem’s: many more breaks in pressure situations over the course of the player’s career, but the opposite trend in the last few years, pointing toward more service holds:

Situation        Career   2016   2017   2018  
For-Set           0.929  0.931  1.200  1.077  
- For-Set Close   0.910  0.895  1.333  1.000  
To-Stay           1.026  1.077  1.083  1.061  
- To-Stay Close   0.929  1.100  1.167  1.044  
Tied Late         0.905  1.050  1.000  1.048  
All Else          1.011  1.013  1.024  1.013

Any ideas? It’s a bit of a trick question–you’re looking at the tour serving against Rafa. From 2012-15, Nadal absolutely shut down opposing servers starting at about 4-4. (He wasn’t as good–relative to his average, anyway–late in sets on his own serve.) Very few players or seasons show effects of greater than 5% in either direction, but Rafa’s opponents saw their hold rate dip by more than twice that in some seasons. Yet the story has been different for the last year or two, with Rafa himself becoming the underperformer in his late-set return games.

Again, we shouldn’t read too much into a single year of this data: The sample size is an issue, especially for a top player’s return games, because not many guys find themselves serving for a set against him. But had we looked at Nadal’s return record in pressure situations alongside Thiem’s recent serve performance, it would have made for a more complicated picture, one less likely to predict some of the crucial moments in last night’s match. In any given contest, there are simply too few key games for us to forecast their outcome with any success, especially when a let cord, an untimely distraction, or a missed line call could reverse the result. But that doesn’t mean we shouldn’t try to understand them. Unlucky, unclutch, or whatever else, Thiem could have flipped the outcome of the entire match by holding just one of those three games. The stakes could hardly be higher.

Handling Injuries and Absences With Tennis Elo

Italian translation at settesei.it

For the last year or so, every mention of my ATP and WTA Elo ratings has required some sort of caveat. Ratings don’t change while players are absent from the tour, so Serena Williams, Novak Djokovic, Andy Murray, Maria Sharapova, and Victoria Azarenka were all stuck at the top of their tour’s Elo rankings. When their layoffs started, they were among the best, and even a smattering of poor results (or a near season’s worth, in the case of Sharapova) isn’t enough to knock them too far down the list.

This is contrary to common sense, and it’s very different from how the official ATP and WTA rankings treat these players. Common sense says that returning players probably aren’t as good as they were before a long break. The official rankings are harsher, removing players entirely after a full year away from the tour. Serena probably isn’t the best player on tour right now (as Elo insisted during her time off), but she’s also much more of a threat than her WTA ranking of No. 454 implies. We must be able to do better.

Before we fix the Elo algorithm, let’s take a moment to consider what “better” means. Fans tend to get worked up about rankings and seedings, as if a number confers value on the player. The official rankings are, by design, backward-looking: They measure players based on their performance over the last 52 weeks, weighted by how the tour prioritizes events. (They are used in a forward-looking way, for tournament seedings, but the system is not designed to be predictive of future results.) In this way, the official rankings say, “this is how good she has played for the last year.” Whatever her ability or potential, Serena (along with Vika, Murray, and Djokovic) hasn’t posted many positive results this year, and her ranking reflects that.

Elo, on the other hand, is designed to be predictive. Out of necessity, it can only use past results, but it uses those results in a way to best estimate how well a player is competing right now–our best proxy for how someone will play tomorrow, or next week. Elo ratings–even the naive ones that said Serena and Novak are your current No. 1s–are considerably better at predicting match outcomes than are the official rankings. For my purposes, that’s the definition of “better”–ratings that offer more accurate forecasts and, by extension, the best approximation of each player’s level right now.

The time-off penalty

When players leave the tour for very long, they return–at least on average, and at least temporarily–at a lower level. I identified every layoff of eight weeks or longer in ATP history, taken by a player with an Elo rating of 1900 or above*. In their first matches back on tour, their pre-break Elo overestimated their chances of winning by about 25%. It varies a bit by the amount of time off: eight- to ten-week breaks resulted in an overestimation around 17%, while 30- to 52-week breaks meant Elo overestimated a player’s chances by nearly 50% upon return. There are exceptions to every rule, like Roger Federer at the 2017 Australian Open, and Rafael Nadal, who won 14 matches in a row after his two-month break this season, but in general, players are worse when they come back.

* I used the cutoff of 1900 because, below that level, some players are alternating between the ATP and Challenger tours. My Elo algorithm doesn’t include challenger results, so for lower-rated players, it’s not clear which timespans are breaks, and which are series of challenger events. Also, the eight-week threshold doesn’t count the offseason, so an eight-week layoff might really mean ~16 weeks between events, with the break including the offseason.

Translated into Elo terms, an eight-week break results in a drop of 100 Elo points, and a not-quite-one-year break, like Andy Murray’s current injury layoff, means a drop of 150 points. Making that adjustment results in an immediate improvement in Elo’s predictiveness for the first match after a layoff, and a small improvement in predictiveness for the first 20 matches after a break.

Incorporating uncertainty

Elo is designed to always provide a “best estimate”–when a player is new on tour, we give him a provisional rating of 1500, and then adjust the rating after each match, depending on the result, the quality of the opponent, and how many matches our player has contested. That provisional 1500 is a completely ignorant guess, so the first adjustment is a big one. Over time, the size of a player’s Elo adjustments goes down, because we learn more about him. If a player loses his first-ever match to Joao Sousa, the only information we have is that he’s probably not as good as Sousa, so we subtract a lot of points. If Alexander Zverev loses to Sousa after more than 150 career matches, including dozens of wins over superior players, we’ll still dock Zverev a few points, but not as many, because we know so much more about him.

But after a layoff, we are a bit less certain that what we knew about a player is still relevant. Djokovic a great example right now. If he lost six out of nine matches (as he did between the Australian Open fourth round and Madrid) without missing any time beforehand, we’d know it was a slump, but most of us would expect him to snap out of it. Elo would reduce his rating, but he’d remain near the top. Since he missed the second half of last season, however, we’re more skeptical–perhaps he’ll never return to his former level. Other cases are even more clear-cut, as when a player returns from injury without being fully healed.

Thus, after a layoff, it makes sense to alter how much we adjust a player’s Elo ratings. This isn’t a new idea–it’s the core concept behind Glicko, another chess rating system that expands on Elo. Over the years, I’ve tinkered with Glicko quite a bit, looking for improvements that apply to tennis, without much success. Changing the multiplier that determines rating adjustments (known as the k factor) doesn’t improve the predictiveness of tennis Elo on its own, but combined with the post-layoff penalties I described above, it helps a bit.

The nitty-gritty: After a layoff, I increase the multiplier by a factor of 1.5, and then gradually reduce it back to 1x over the next 20 matches. The flexible multiplier slightly improves the accuracy of Elo ratings for those 20 matches, though the difference is minor compared to the effect of the initial penalty.

No more caveats*

* I thought it would be funny to put an asterisk after “no more caveats.”

Post-layoff penalties and flexible multipliers end up bringing down the current Elo ratings of the players who are in the middle of long breaks or have recently come back from them, giving us ranking tables that come closer to what we expect–and should do a better job of predicting the outcome of upcoming matches. These changes to the algorithm also have minor effects on the ratings of other players, because everyone’s rating depends on the rating of all of his or her opponents. So Taro Daniel’s Elo bounce from defeating Djokovic in Indian Wells doesn’t look quite as good as it did before I implemented the penalty.

On the ATP side, the new algorithm knocks Djokovic down to 3rd in overall Elo, Murray to 6th, Jo-Wilfried Tsonga to 21st, and Stan Wawrinka to 24th. That’s still quite high for Novak considering what we’ve seen this year, but remember that the Elo algorithm only knows about his on-court performances: A six-month break followed by a half-dozen disappointing losses. The overall effect is about a 200-point drop from his pre-layoff level; the “problem” is that his Elo a year ago reflected how jaw-droppingly good he had recently been.

The WTA results match my intuition even better than I hoped. Serena falls to 7th, Sharapova to 18th, and Azarenka to 23rd. Because of the flexible multiplier, a few early wins for Williams will send her quickly back up the rankings. Like Djokovic, she rates so high in part because of her stratospheric Elo rating before her time off. For her part, Sharapova still rates higher by Elo than she does in the official rankings. Despite the penalty for her one-year drug suspension, the algorithm still treats her prior success as relevant, even if that relevance fades a bit more every week.

Elo is always an approximation, and given the wide range of causes that will sideline a player, not to mention the spectrum of strategies for returning to the tour, any rating/forecasting system is going to have a harder time with players in that situation. That said, these improvements give us Elo ratings that do a better job of representing the current level of players who have missed time, and they will allow us to make superior predictions about matches and tournaments involving those players.

Under the hood

If you’re interested in some technical details, keep reading.

Before making these adjustments, the Brier score for Elo-based predictions of all ATP matches since 1972 was about 0.20. For all matches that involved at least one player with an Elo of 1900 or better, it was 0.17. (Not only are 1900+ players better, their ratings tend to be based on more data, which at least partly explains why the predictions are better. The lower the Brier score, the better.)

For the population of about 500 “first matches” after layoffs for qualifying players, the Brier score before these changes was 0.192. After implementing the penalty, it improved to 0.173.

For the 2nd through 20th post-comeback matches, the Brier score for the original algorithm was 0.195. After adding the penalty, it was 0.191, and after making the multiplier flexible, it fell a bit more to 0.190. (Additional increases to the post-layoff multiplier had negative results, pushing the Brier score back to about 0.195 when the 2nd-match multiplier was 2x.) I realize that’s a tiny change, and it very possibly won’t hold up in the future. But in looking at various notable players over the course of their comebacks, that’s the option that generated results that looked the most intuitively accurate. Since my intuition matched the best Brier score (however miniscule the difference), it seems like the best option.

Finally, a note on players with multiple layoffs. If someone misses six months, plays a few matches, then misses another two months, it doesn’t seem right to apply the penalty twice. There aren’t a lot of instances to use for testing, but the limited sample confirms this. My solution: If the second layoff is within two years of the previous comeback, combine the length of the two layoffs (here: eight months), find the penalty for a break of that length, and then apply the difference between that penalty and the previous one. Usually, that results in second-layoff penalties of between 10 and 50 points.

Measuring Return Aggression

In the last couple of years, I’ve gotten a lot of mileage out of a metric called Aggression Score (AS), first outlined here by Lowell West. The stat is so useful due to its simplicity. The more aggressive a player is, the more she’ll rack up both winners and unforced errors. AS, then, is essentially the rate at which a player hits winners and unforced errors.

Yet one limitation lies in Aggression Score’s simplicity. It works best when winners and unforced errors move together, and when they are roughly similar. If someone is having a really bad day, her unforced errors might skyrocket, resulting in a higher AS, even if the root cause of the errors is poor play, not aggression. On the flip side, a locked-in player will see her AS increase by hitting more winners, even if those winners are more a reflection of good form than a high-risk tactic.

I’ve long wanted to extend the idea behind Aggression Score to return tactics, but when we narrow our view to the second shot of the rally, the simplicity of the metric becomes a handicap. On the return, the vast majority of “aggressive” shots are errors, so the results will be swamped by error rate, minimizing the role of return winners, which are a more reliable indicator. Using Match Charting Project data from 2010-present women’s tennis, returns result in errors 18% of the time, while they turn into winners (or they induce forced errors) less than one-third as often, 5.5% of the time. The appealingly simple Aggression Score formula, narrowed to consider only returns of serve, won’t do the job here.

Return aggression score

Let’s walk through a formula to measure return aggression, using last month’s Miami final between Sloane Stephens and Jelena Ostapenko as an example. Tallying up return points (excluding aces and service winners), along with return errors* and return winners** for both players from the match chart, we get the following:

Returner          RetPts  RetErr  RetWin  RetE%  RetW%  
Sloane Stephens       64       9       1  14.1%   1.6%  
Jelena Ostapenko      63      11       6  17.5%   9.5%

* “errors” are a combination of forced and unforced, because most return errors are scored as forced errors, and because the distinction between the two is so unreliable as to be meaningless. Some forced error returns are nearly impossible to make, so they don’t really belong in this analysis, but with the state of available data, it’ll have to do.

** throughout this post, I’ll use “winners” as short-hand for “winners plus induced forced errors” — that is, shots that were good enough to end the point.

These numbers make clear which of the two players is the aggressive one, and they confirm the obvious: Ostapenko plays much higher-risk tennis than Stephens does. In this case, Ostapenko’s rates are nearly equal to or above the tour averages of 17.8% and 5.5%, while both of Stephens’s are well below them.

The next step is to normalize the error and winner rates so that we can more easily see how they relate to each other. To do that, I simply divide each number by the tour average:

Returner          RetE%  RetW%  RetE+  RetW+  
Sloane Stephens   14.1%   1.6%   0.79   0.28  
Jelena Ostapenko  17.5%   9.5%   0.98   1.73

The last two columns show the normalized figures, which reflect how each rate compares to tour average, where 1.0 is average, greater than 1 means more aggressive, and less than 1 means less aggressive.

We’re not quite done yet, because, as Ostapenko and Stephens illustrate, return winner rates are much noisier than return error rates. That’s largely a function of how few there are. The gap between the two players’ normalized rates, 0.28 and 1.73, looks huge, but represents a difference of only five winners. If we leave return winner rates untouched, we’ll end up with a metric that varies largely due to movement in winner rates–the opposite problem from where we started.

To put winners and errors on a more equal footing, we can express both in terms of standard deviations. The standard deviation of the adjusted error ratio is 0.404, while the standard deviation of the adjusted winner ratio is 0.768, so when we divide the ratios by the standard deviations, we’re essentially reducing the variance in the winner number by half. The resulting numbers tell us how many standard deviations a certain statistic is above or below the mean, and these final results give us winner and error rates that are finally comparable to each other:

Returner          RetE+  RetW+  RetE-SD  RetW-SD  
Sloane Stephens    0.79   0.28    -0.52    -0.93  
Jelena Ostapenko   0.98   1.73    -0.05     0.95

(Math-oriented readers might notice that the last two steps don’t need to be separate; we could just as easily think of these last two numbers as standard deviations above or below the mean of the original winner and error rates. I included the intermediate step to–I hope–make the process a bit more intuitive.)

Our final stat, Return Aggression Score (RAS) is simply the average of those two rates measured in standard deviations:

Returner          RetE-SD  RetW-SD    RAS  
Sloane Stephens     -0.52    -0.93  -0.73  
Jelena Ostapenko    -0.05     0.95   0.45

Positive numbers represent more aggression than tour average; negative numbers less aggression. Ostapenko’s +0.45 figure is higher than about 75% of player-matches among the nearly 4,000 in the Match Charting Project dataset, though as we’ll see, it is far more conservative than her typical strategy. Stephens’s -0.73 mark is at the opposite position on the spectrum, higher than only one-quarter of player-matches. It is also lower than her own average, though it is higher than the -0.97 RAS she posted in the US Open final last fall.

The extremes

The first test of any new metric is whether the results actually make sense, and we need look no further than the top ten most aggressive player-matches for confirmation. Five of the top ten most aggressive single-match return performances belong to Serena Williams, and the overall most aggressive match is Serena’s 2013 Roland Garros semifinal against Sara Errani, which rates at 3.63–well over three standard deviations above the mean. The other players represented in the top ten are Ostapenko, Oceane Dodin, Petra Kvitova, Madison Keys, and Julia Goerges–a who’s who of high-risk returning in women’s tennis.

The opposite end of the spectrum includes another group of predictable names, such as Simona Halep, Agnieszka Radwanska, Caroline Wozniacki, Annika Beck, and Errani. Two of Halep’s early matches are lowest and third-lowest, including the 2012 Brussels final against Radwanska, in which her return aggression was 1.6 standard deviations below the mean. It’s not as extreme a mark as Serena’s performances, but that’s the nature of the metric: Halep returned 46 of 48 non-ace serves, and none of the 46 returns went for winners. It’s tough to be less aggressive than that.

The leaderboard

The Match Charting Project has shot-by-shot data on at least ten matches each for over 100 WTA players. Of those, here are the top ten, as ranked by RAS:

Player                    Matches  RetPts   RAS  
Oceane Dodin                   11     665  1.18  
Aryna Sabalenka                11     816  1.12  
Camila Giorgi                  19    1155  1.07  
Mirjana Lucic                  11     707  1.05  
Julia Goerges                  27    1715  0.94  
Petra Kvitova                  65    4142  0.90  
Serena Williams                91    5593  0.90  
Jelena Ostapenko               35    2522  0.88  
Anastasia Pavlyuchenkova       21    1180  0.78  
Lucie Safarova                 34    2294  0.77

We’ve already seen some of these names, in our discussion of the highest single-match marks. When we average across contests, a few more players turn up with RAS marks over one full standard deviation above the mean: Aryna Sabalenka, Camila Giorgi, and Mirjana Lucic-Baroni.

Again, the more conservative players don’t look as extreme: Only Madison Brengle has a RAS more than one standard deviation below the mean. I’ve included the top 20 on this list because so many notable names (Wozniacki, Radwanska, Kerber) are between 11 and 20:

Player                Matches  RetPts     RAS  
Madison Brengle            11     702   -1.06  
Monica Niculescu           32    2099   -0.93  
Stefanie Voegele           12     855   -0.85  
Annika Beck                16    1181   -0.78  
Lara Arruabarrena          10     627   -0.72  
Johanna Larsson            14     873   -0.65  
Barbora Strycova           20    1275   -0.63  
Sara Errani                25    1546   -0.60  
Carla Suarez Navarro       36    2585   -0.55  
Svetlana Kuznetsova        27    2271   -0.55 

Player                Matches  RetPts     RAS  
Viktorija Golubic          16    1272   -0.53  
Agnieszka Radwanska        96    6239   -0.51  
Yulia Putintseva           22    1552   -0.51  
Caroline Wozniacki         80    5165   -0.50  
Christina McHale           11     763   -0.48  
Angelique Kerber           93    6611   -0.46  
Louisa Chirico             13     806   -0.44  
Darya Kasatkina            26    1586   -0.43  
Magdalena Rybarikova       12     725   -0.41  
Anastasija Sevastova       30    1952   -0.40

A few more notable names: Halep, Stephens and Elina Svitolina all count among the next ten lowest, with RAS figures between -0.30 and -0.36. The most “average” player among game’s best is Victoria Azarenka, who rates at -0.08. Venus Williams, Johanna Konta, and Garbine Muguruza make up a notable group of aggressive-but-not-really-aggressive women between +0.15 and +0.20, just outside of the game’s top third, while Maria Sharapova, at +0.63, misses our first list by only a few places.

Unsurprisingly, these results track quite closely to overall Aggression Score figures, as players who adopt a high-risk strategy overall are probably doing the same when facing the serve. This metric, however, allows to identify players–or even single matches–for which the two strategies don’t move in concert. Further, the approach I’ve taken here, to separate and normalize winners and errors, rather than treat them as an undifferentiated mass, could be applied to Aggression Score itself, or to other more targeted versions of the metric, such as a third-shot AS, or a backhand-specific AS.

As always, the more data we have, the more we can learn from it. Analyses like these are only possible with the work of the volunteers who have contributed to the Match Charting Project. Please help us continue to expand our coverage and give analysts the opportunity to look at shot-by-shot data, instead of just the basics published by tennis’s official federations.

Translating ATP Statistics Across Main Tour and Challenger Levels

Italian translation at settesei.it

What is the gap between the top-level ATP Tour and the lower-level ATP Challenger Tour? Some players pile up trophies in the minor leagues yet have a hard time converting that success to match wins on the big tour, while others struggle with the week-to-week grind of the challengers but excel when given opportunities on the larger stage.

Let’s take a look at a method that measures the difference between the skill level on the two tours. Once we can translate stats between levels, we can identify those players who are much better or worse than expected when they have the chance to compete against the best.

The algorithm I’ll use is almost identical to the one baseball analysts have used for decades to determine league equivalencies. For instance, we might find that a batting average of .300 in Triple-A (the highest minor league) is equivalent to .280 in the majors, meaning that, if a player is batting .300 in Triple-A, we’ll expect him to bat .280 in the majors. In tennis terms, it may be that a 10% ace rate in challengers is equivalent to a 8% ace rate on the main tour. Not every player will exhibit that precise drop in performance–some may even appear to get a little better–but on average, a league equivalency tells us what to expect when a player changes levels.

Here is the algorithm for league equivalencies, as applied to men’s tennis:

  1. Pick a stat to focus on. I’ll use Total Points Won (TPW) here.
  2. Neutralize that stat as much as possible. In baseball, that means controlling for the difference in parks; in tennis, it means controlling for competition. For the following, I’ve adjusted for each player’s quality of competition using a method I described about a year ago. Most players’ numbers are about the same after the adjustment, but a particularly easy or tough schedule means a bigger shift. For instance, Denis Shapovalov posted a TPW of 49.8% on the big tour last season, but because he played such high-quality competition, the adjustment bumps him up to 52.1%, 18th among tour regulars.
  3. Identify players who competed at both levels, and find their adjusted stats at each level. Shapovalov played 18 tour-level matches and 30 challenger-level matches last year, with adjusted TPW numbers of 52.1% and 54.4%, respectively.
  4. Calculate the ratio for each player. For Shapovalov last year, it was 1.044 (54.4 / 52.1).
  5. Finally, take a weighted average of every player’s ratio. The weight is determined by the minimum number of matches played at either level, so for Shapovalov, it’s 18. Using the minimum means that a player like Gleb Sakharov (1 ATP match, 37 challenger matches) can be included in the calculation, but has very little effect on the end result.

Here are the results for the last six full seasons. Each ratio is the relationship between challenger-level TPW and tour-level TPW:

Year  Ratio  
2017  1.086  
2016  1.086  
2015  1.098  
2014  1.103  
2013  1.100  
2012  1.100

The average of these yearly equivalency factors is roughly the difference between a 52.5% TPW at challengers and a 48.0% TPW on the main tour. The shift from 2012-15 to 2016-17 may reflect the injuries that have sidelined the elites. With fewer elite players on court, the gap between the two tours narrows.

Now that we know the difference between the levels, we can find the players who defy the usual patterns. Of the 100 players with the most “paired” matches–that is, with the most matches at both levels in the same years–here are the 20 with the lowest ratios. Low ratios mean less difference in performance between the two levels, so these guys are either overperforming at tour level or underperforming at challengers:

Player              ATP M  CH M  Min M  Ratio  
Matthew Ebden          62   140     39  0.982  
Jared Donaldson        68    78     37  1.030  
Jack Sock              81    45     38  1.039  
James Duckworth        53   156     53  1.042  
Andrey Rublev          56    79     42  1.047  
Vasek Pospisil         96    76     60  1.047  
Thiemo De Bakker       48    87     44  1.048  
Samuel Groth           84   133     58  1.049  
Michael Berrer         59   107     56  1.050  
Ruben Bemelmans        41   178     41  1.052  
Dustin Brown          120   173    111  1.055  
Benoit Paire          295    53     53  1.059  
Peter Gojowczyk        46   132     44  1.059  
Michael Russell        58    78     58  1.061  
Marius Copil           58   180     58  1.063  
Taylor Harry Fritz     59    44     41  1.065  
Jordan Thompson        38    88     38  1.066  
Illya Marchenko        56   116     37  1.066  
Tatsuma Ito            65   179     65  1.066  
Ryan Harrison         124    84     59  1.068

The middle columns show the total number of ATP matches, challenger matches, and “paired” matches between 2012 and 2017 (“Min M”) for each player. (The last number gives an indication of just how much data was available for the single-player calculation.) Aside from a few big-serving North Americans near the top of this list, I don’t see a lot of obvious commonalities. There are some youngsters, some veterans, more big servers than not, but nothing obvious.

(Shapovalov doesn’t have enough paired matches to qualify, but his overall ratio is 1.035, good for third on this list.)

Here is the opposite list, the quintile of 20 players who have overperformed at challengers or underperformed on tour:

Player               ATP M  CH M  Min M  Ratio  
Florian Mayer          152    45     45  1.180  
Mikhail Youzhny         91    38     38  1.169  
Aljaz Bedene           144   121     80  1.160  
Filippo Volandri        62   101     62  1.158  
Robin Haase            194    71     71  1.157  
Tobias Kamke           102   144     73  1.155  
Adrian Mannarino       234   115     86  1.155  
Filip Krajinovic        36   167     36  1.148  
Albert Ramos           111    67     62  1.144  
Paul Henri Mathieu     147    96     82  1.141  
Kenny De Schepper       77   196     77  1.140  
Facundo Bagnis          45   197     45  1.136  
Pablo Cuevas           127    52     43  1.136  
Ivan Dodig              76    48     41  1.135  
Santiago Giraldo       146    70     56  1.135  
Paolo Lorenzi          204   191    124  1.135  
Thomaz Bellucci        162    44     44  1.134  
Albert Montanes        113   109     70  1.130  
Rogerio Dutra Silva     57   210     57  1.130  
Lukas Lacko            122   181    108  1.129

There are more clay-courters here than on the first list, and the very top of the ranking includes veterans who have mastered the challenger level, even if they still struggle to maintain a foothold on the main tour. I’ve had to exclude one player who belongs on this list: Gilles Muller broke my algorithm with his 45-9 challenger season in 2014. When I took him out of the 2014 calculations, the overall numbers changed very little, but it means no Muller here. Whatever his exact ratio, I can say that his tour-level performance hasn’t matched that 2014 run at challengers.

The bottoms of the two lists indicate that there isn’t that much variation between players. The middle 60% of players all have ratios between about 1.07 and 1.13, while the yearly averages hover around 1.09 and 1.10. Some players under consideration here have fewer than 50 “paired” matches over the six seasons, so a difference of a couple hundredths is far too little to draw any conclusions.

This algorithm, beyond suggesting what to expect from players when they move up from challengers to the main tour, could apply the same reasoning to other pairs of levels, such as ITF Futures and challengers, or women’s ITFs and the WTA tour. It could even compare narrower levels, such as ITF $10,000 events with ITF $15,000s, or ATP 250s with ATP 500s. The method is a staple of analytics in other sports, and it has a place in tennis, as well.

The Power of One Point Per Thousand

Italian translation at settesei.it

Last week, I offered a method to rank smash-hitting skill. I measured the results in “points per 100”–the number of points a player could expect to gain or lose, relative to tour average, thanks to their ability hitting that one shot. The resulting figures were quite small: My calculations showed that Jo-Wilfried Tsonga has the game’s best smash, a shot worth 0.17 points per 100 above average, and 0.27 points per 100 above the weakest smash-hitting player I found, Pablo Cuevas.

That gap between best and worst of 0.27 per 100 gives us a rough maximum of how much difference a good or bad smash can make in a player’s game. The rate is roughly equivalent to one point out of 370. It sounds tiny, and since most players are closer to the average than they are to either of those extremes, the typical smash effect is even smaller still.

However, it’s difficult to have any intuitive sense of how much one point is worth. In any given match, a single point, or even five points, isn’t going to make the difference. On the other hand, plenty of matches are so close that one or two points would flip the result. If an average player could train really hard in the offseason and develop a smash just as good as Tsonga’s, what would that extra 0.17 points per 100 mean for him in the win column? What about in the rankings?

This is a relatively straightforward question to answer once we’ve posed it. Over the course of a season, the best players win more points than their peers–obviously. Yet the margin isn’t that great. In 2017, no man won points at a higher clip than Rafael Nadal, who came out on top 55.7% of the time. That’s less than seven percentage points higher than the worst player in the top 50, Paolo Lorenzi, who won 49.1% of points. Nearly half of top 50 players–22 of them–won between 49.0% and 51.0% of total points, and another 15% fell between 51.0% and 52.0%.

Fixing total points won

These numbers are slightly misleading, though only slightly. The total points won stat (TPW) tends to cluster very close to the 50% mark because competitors face what, in other sports, we would call unbalanced schedules. If you win, you usually have to play someone better in the next round; win again, and an even more superior opponent awaits. This means that the 6.6% gap between Nadal and Lorenzi is a bit wider than it sounds: Had the Italian faced the same set of opponents that Rafa did, he wouldn’t have managed to win 49.1% of points.

That problem, however, is possible to resolve. Earlier this year I shared an algorithm that analyzed return points won by controlling for opponent, by comparing how each pair of players fared in equivalent matchups. (That analysis hinted at the second-half breakthrough of return wizard Diego Schwartzman.) While we don’t know exactly what would happen if Lorenzi played Nadal’s exact schedule, we can use this common-opponent approach to approximate it. When we do so, we find that the 1st-to-50th, Nadal-to-Lorenzi spread is almost 10 percentage points; setting Rafa’s rate at a constant 55.7%, Lorenzi’s works out a less neutral-sounding 46.2%. Many players remain packed in the 49%-to-51% range, but the overall spread is wider, because we control for tennis’s natural tendency to cancel out player’s wins with subsequent losses.

Even when we widen the pool of players to 71–everyone who played at least 35 tour-level matches this season–the ten-percentage-point spread remains. Lorenzi remains close to the bottom, a few places above Mikhail Youzhny, whose competition-adjusted rate of points won is 45.7% ranks last, exactly ten points below Rafa.

Think about what that means: In a typical ATP match, for every hundred points played, only ten are really up for grabs. That isn’t literally true, of course: There are plenty of matches in which one player wins 60% or more of total points. But on average, you can expect even the weakest tour regular to win 45 out of 100 points. In team sports analytics, this is what we might call “replacement level”–the skill level of a freely available minor leaguer or bench player. I don’t like importing the concept of replacement level for tennis, because in an individual sport you’re never really replacing one player with another. But at the most general level, it’s a useful way of thinking about this subject–just as even a minor league batter could hit .230 in the major leagues (as opposed to .000), so a fringey ATP player will win 45% of points, not 0%.

Points to wins

In team sports analytics, it’s common to say that some number of runs, or goals, or points is equal to one win. Thinking in terms of wins is a good way to value players: If you can say that upgrading your goalkeeper is worth two wins over your current option, it makes very clear what he brings to the table. Again, the metaphor is a bit strained when we apply it to tennis, but we can start thinking about things in the same way.

Another oddity in tennis is that players not only face very unequal competition, they also play widely different numbers of matches. The year-end top 50 contested anywhere from 35 matches up to more than 80; part of the variation is due to injury, but much is structural: The more matches you win, the more you play. Rafa managed his schedule by entering only a handful of optional events, yet only David Goffin played more matches. So we have another quirk to handle: In this case, let’s adopt the fiction that a tennis season is exactly 50 matches long. Rafa’s actual record was 67-11; scaled to a 50-match season, that’s roughly 43-7.

Finally, we can look at the relationship between points and wins. Points, here, means the rate of total points won adjusted for competition. And wins is the number of victories in our hypothetical 50-match season. The relationship between points and wins is quite strong (r^2 = 0.75), though of course not exact. Roger Federer won matches at a higher rate than Nadal did, but by competition-adjusted total points won, Rafa trounced him, 55.7% to 53.5%. And as we’ve seen, Lorenzi is close to the bottom of our 71-player sample, despite hanging on to a ranking in the mid-40s. Luck, clutch play, and a host of other factors make the points-to-wins relationship imperfect, but it is nonetheless a healthy one.

It doesn’t take many points to boost one’s win total. An increase of only 0.367 points per 100 translates into one more win in a 50-match season. The average player contests 8,000 points per season, so we’re talking about only 29 more points per year. This puts my smash-skill conclusions in a new light: The spread between the best and the worst of 0.27 points per 100 seemed tiny, but now we see it’s worth almost a full win over the course of a 50-match season.

Wins to ranking places

Unless you’re nearing a round number and have a hankering for cake, even wins aren’t the currency that really matters in tennis. What counts is position on the ranking table. The relationship between wins and ranking position is another strong but imperfect one (r^2 = 0.63).

As we’ve seen, the middle of the ATP pack is tightly grouped together in total points won, with so many players hovering around the 50% mark, even when adjusted for competition. There’s not much to distinguish between these men in the win column, either: On average, an increase of 0.26 wins per 50 matches translates into a one-spot jump on the ranking computer. Put another way: If you win one more match, your ranking will improve by four places. Again, these are not iron laws–in reality, it depends when and where that extra win occurs, and the corresponding ranking improvement could be anywhere from zero spots to 30. Still, knowing the typical result allows us to understand better the impact of each marginal win and, by extention, the value of winning a few more points.

One point per thousand

Combine these two relationships, and we get a new, conveniently round-numbered rule of thumb. If an increase in one ranking place requires 0.26 additional wins per 50 matches, and one additional win requires 0.367 extra points per 100, a little tapping at the calculator demonstrates that one ranking place is equal to about 0.095 points per 100. Round up a bit to 0.1 per 100, and we’re looking at one point per thousand.

One extra point per thousand is a miniscule amount, the sort of difference we could never dream of spotting with the naked eye. Players regularly win entire tournaments without contesting so many points; even for Goffin, who served or returned more than 12,000 times this year, we’re talking about a dozen points. Yet think back to all of those players clustered between 49% and 52% of total points won; even when adjusted for competition, three men ended the 2017 season tied at exactly 50.4%, with less than one point per thousand separating the three of them.

The one part of the ranking table where one point per thousand is no more than a rounding error is the very top. Usually one player separates himself from the pack, and the top few distance themselves from the rest. This year is no different: The competition-adjusted gap between Nadal and Federer is a whopping 2.2% (22 points per thousand), while the next 2.2% takes us all the way from Fed through the entire top 10. The 2.2% after that, extending from 51.1% to 48.9%, covers another 20 players: spaced, on average, one point per thousand apart. For a player seeking to improve from 30th to 20th, the path is largely linear; from 5th to 3rd it is much less predictable–and probably steeper.

If this all sounds unnecessarily abstruse, I can only mention once again the example of my smash-skill findings. Now we know that the range of overhead-hitting ability among the game’s regulars is worth close to three places in the rankings. Imagine a similar type of conclusion for forehands, backhands, net approaches… it’s exciting stuff. While plenty of work lies ahead, this framework allows us to measure the impact of individual shots–perhaps even tactics–and translate that impact into ranking places, the ultimate currency of tennis.

Measuring the Best Smashes in Tennis

Italian translation at settesei.it: part 1, part 2

How can we identify the best shots in tennis? At first glance, it seems like a simple problem. Thanks to the shot-by-shot data collected for over 3,500 matches by the Match Charting Project, we can look at every instance of the shot in question and see what happened. If a player hits a lot of winners, or wins most of the ensuing points, he or she is probably pretty good at that shot. Lots of unforced errors would lead us to conclude the opposite.

A friend recently posed a more specific question: Who has the best smash in the men’s game? Compared to other shots such as, say, slice backhands, smashes should be pretty easy to evaluate. A large percentage of them end the point–in the contemporary men’s game (I discuss the women’s game later on), 69% are winners or induce forced errors–which reduces the problem to a straightforward one.

The simplest algorithm to answer my friend’s question is to determine how often each player ends the point in his favor when hitting a smash–that is, with a winner or by inducing a forced error. Call the resulting ratio “W/SM.” The Match Charting Project (MCP) dataset has at least 10 tour-level matches for 80 different men, and the W/SM ratio for those players ranges from 84% (Jeremy Chardy) all the way down to 30% (Paolo Lorenzi). Both of those extremes are represented by players with relatively small samples; if we limit our scope to men with at least 90 recorded smashes, the range isn’t quite as wide. The best of the bunch is Jo-Wilfried Tsonga, at 79%, and the “worst” is Ivan Lendl, at 57%. That isn’t quite fair to Lendl, since smash success rates have improved quite a bit over the years, and Lendl’s rate is only a couple percentage points below the average for the 1980s. Among active players with at least 90 smashes in the books, Stan Wawrinka brings up the rear, with a W/SM of 65%.

We can look at the longer-term effects of a player’s smashes without adding much complexity. It’s ideal to end the point with a smash, but most players would settle for winning the point. When hitting a smash, ATPers these days end up winning the point 81% of the time, ranging from 97% (Chardy again) down to 45% (Lorenzi again). Once again, Tsonga leads the pack of the bigger-sample-size players, winning the point 90% of the time after hitting a smash, and among active players, Wawrinka is still at the bottom of that subset, at 77%.

Here is a list of all players with at least 90 smashes in the MCP dataset, with their winners (and induced forced errors) per smash (W/SM), errors per smash (E/SM), and points won per smash (PTS/SM):

PLAYER              W/SM  E/SM  PTS/SM  
Jo-Wilfried Tsonga   78%    6%     90%  
Tomas Berdych        76%    6%     88%  
Pete Sampras         75%    7%     86%  
Roger Federer        73%    7%     86%  
Rafael Nadal         69%    7%     84%  
Milos Raonic         73%    9%     82%  
Andy Murray          67%    6%     82%  
Kei Nishikori        68%   11%     81%  
David Ferrer         71%    9%     81%  
Andre Agassi         67%    8%     80%  
Novak Djokovic       66%    9%     80%  
Stefan Edberg        62%   12%     78%  
Stan Wawrinka        65%   10%     77%  
Ivan Lendl           57%   13%     71%

These numbers give us a pretty good idea of who you should back if the ATP ever hosts the smash-hitting equivalent of baseball’s Home Run Derby. Best of all, it doesn’t commit any egregious offenses against common sense: We’d expect to see Tsonga and Roger Federer near the top, and we’d know something was wrong if Novak Djokovic were too far from the bottom.

Smash opportunities

Still, we need to do better. Almost every shot made in a tennis match represents a decision made by the player hitting it: topspin or slice? backhand or run-around forehand? approach or stay back? Many smashes are obvious choices, but a large number are not. Different players make different choices, and to evaluate any particular shot, we need to subtly reframe the question. Instead of vaguely asking for “the best,” we’d be better served looking for the player who gets the most value out of his smash. While the two questions are similar, they are not the same.

Let’s expand our view to what we might call “smash opportunities.” Once again, smashes make our task relatively straightforward: We can define a smash opportunity simply as a lob hit by the opponent.* In the contemporary ATP, roughly 72% of lobs result in smashes–the rest either go for winners or are handled with a different shot. Different players have very different strategies: Federer, Pete Sampras, and Milos Raonic all hit smashes in more than 84% of opportunities, while a few other men come in under 50%. Nick Kyrgios, for instance, tried a smash in only 20 of 49 (41%) of recorded opportunities. Of those players with more available data, Juan Martin Del Potro elected to go for the overhead in 61 of 114 (54%) of chances, and Andy Murray in 271 of 433 (62.6%).

* Using an imperfect dataset, it’s a bit more complicated; sometimes the shots that precede smashes are coded as topspin or slice groundstrokes. I’ve counted those as smash opportunities as well.

Not all lobs are created equal, of course. With a large number of points, we would expect them to even out, but even then, a player’s overall style may effect the smash opportunities he sees. That’s a more difficult issue for another day; for now, it’s easiest to assume that each player’s mix of smash opportunities are roughly equal, though we’ll keep in mind the likelihood that we’ve swept some complexity under the rug.

With such a wide range of smashes per smash opportunities (SM/SMO), it’s clear that some players’ average smashes are more difficult than others. Federer hits about half again as many smashes per opportunity as del Potro does, suggesting that Fed’s attempts are more difficult than Delpo’s; on those more difficult attempts, Delpo is choosing a different shot. The Argentine is very effective when he opts for the smash, winning 84% of those points, but it seems likely that his rate would not be so high if he hit smashes as frequently as Federer does.

This leads us to a slightly different question: Which players are most effective when dealing with smash opportunities? The smash itself doesn’t necessarily matter–if a player is equally effective with, say, swinging volleys, the lack of a smash would be irrelevant. The smash is simply an effective tool that most players employ to deal with these situations.

Smash opportunities don’t offer the same level of guarantee that smashes themselves do: In the ATP these days, players win 72% of points after being handed a smash opportunity, and 56% of the shots they hit result in winners or induced forced errors. Looking at these situations takes us a bit off-track, but it also allows us to study a broader question with more impact on the game as a whole, because smash opportunities represent a larger number of shots than smashes themselves do.

Here is a list of all the players with at least 99 smash opportunities in the MCP dataset, along with the rate at which they hit smashes (SM/SMO), the rate at which they hit winners or induced forced errors in response to smash opportunites (W/SMO), hit errors in those situations (E/SMO), and won the points when given lobs (PTW/SMO). Like the list above, players are ranked by the rightmost column, points won.

PLAYER              SM/SMO  W/SMO  E/SMO  PTW/SMO  
Jo-Wilfried Tsonga     80%    68%    13%      80%  
Roger Federer          84%    66%    13%      78%  
Pete Sampras           86%    68%    15%      78%  
Tomas Berdych          75%    66%    16%      76%  
Milos Raonic           85%    67%    14%      76%  
Novak Djokovic         81%    60%    13%      75%  
Kevin Anderson         66%    57%    12%      74%  
Rafael Nadal           74%    57%    16%      73%  
Andre Agassi           77%    62%    17%      73%  
Boris Becker           85%    59%    18%      72%  
Stan Wawrinka          79%    58%    15%      72%  
Kei Nishikori          72%    57%    17%      70%  
Andy Murray            63%    52%    15%      70%  
Dominic Thiem          66%    52%    11%      70%  
David Ferrer           71%    57%    17%      69%  
Pablo Cuevas           73%    54%    14%      67%  
Stefan Edberg          81%    52%    23%      65%  
Bjorn Borg             81%    41%    20%      63%  
JM del Potro           54%    48%    19%      60%  
Ivan Lendl             74%    45%    28%      59%  
John McEnroe           74%    43%    24%      56%

The order of this list has much in common with the previous one, with names like Federer, Sampras, and Tsonga at the top. Yet there are key differences: Djokovic and Wawrinka are particularly effective when they respond to a lob with something other than an overhead, while del Potro is the opposite, landing near the bottom of this ranking despite being quite effective with the smash itself.

The rate at which a player converts opportunities to smashes has some impact on his overall success rate on smash opportunities, but the relationship isn’t that strong (r^2 = 0.18). Other options, such as swinging volleys or mid-court forehands, also give players a good chance of winning the point.

Smash value

Let’s get back to my revised question: Who gets the most value out of his smash? A good answer needs to combine how well he hits it with how often he hits it. Once we can quantify that, we’ll be able to see just how much a good or bad smash can impact a player’s bottom line, measured in overall points won, and how much a great smash differs from an abysmal one.

As noted above, the average current-day ATPer wins the point 81% of the time that he hits a smash. Let’s reframe that in terms of the probability of winning a point: When a lob is flying through the air and a player readies his racket to hit an overhead, his chance of winning the point is 81%–most of the hard work is already done, having generated such a favorable situation. If our player ends up winning the point, the smash improved his odds by 0.19 points (from 0.81 to 1.0), and if he ends up losing the point, the smash hurt his odds by 0.81 (from 0.81 to 0.0). A player who hits five successful smashes in a row has a smash worth about one total point: 5 multiplied by 0.19 equals 0.95.

We can use this simple formula to estimate how much each player’s smash is worth, denominated in points. We’ll call that Point Probability Added (PPA). Finally, we need to take into account how often the player hits his smash. To do so, we’ll simply divide PPA by total number of points played, then multiply by 100 to make the results more readable. The metric, then, is PPA per 100 points, reflecting the impact of the smash in a typical short match. Most players have similar numbers of smash opportunities, but as we’ve seen, some choose to hit far more overheads than others. When we divide by points, we give more credit to players who hit their smashes more often.

The overall impact of the smash turns out to be quite small. Here are the 1990s-and-later players with at least 99 smash opportunities in the dataset along with their smash PPA per 100 points:

PLAYER                 SM PPA/100  
Jo-Wilfried Tsonga           0.17  
Pete Sampras                 0.11  
Tomas Berdych                0.11  
Roger Federer                0.10  
Rafael Nadal                 0.05  
Milos Raonic                 0.04  
Juan Martin del Potro        0.02  
Andy Murray                  0.01  
Kevin Anderson               0.01  
Kei Nishikori                0.00  
David Ferrer                 0.00  
Andre Agassi                 0.00  
Novak Djokovic              -0.02  
Stan Wawrinka               -0.07  
Dominic Thiem               -0.07  
Pablo Cuevas                -0.10

Tsonga reigns supreme, from the most basic measurement to the most complex. His 0.17 smash PPA per 100 points means that the quality of his overhead earns him about one extra point (compared to an average ATPer) every 600 points. That doesn’t sound like much, and rightfully so: He hits fewer than one smash per 50 points, and as good as Tsonga is, the average player has a very serviceable smash as well.

The list gives us an idea of the overall range of smash-hitting ability, as well. Among active players, the laggard in this group is Pablo Cuevas, at -0.1 points per 100, meaning that his subpar smash costs him one point out of every thousand he plays. It’s possible to be worse–in Lorenzi’s small sample, his rate is -0.65–but if we limit our scope to these well-studied players, the difference between the high and low extremes is barely 0.25 points per 100, or one point out of every 400.

I’ve excluded several players from earlier generations from this list; as mentioned earlier, the average smash success rate in those days was lower, so measuring legends like McEnroe and Borg using a 2010s-based point probability formula is flat-out wrong. That said, we’re on safe ground with Sampras and Agassi; the rate at which players convert smashes into points won has remained fairly steady since the early 1990s.

Lob-responding value

We’ve seen the potential impact of smash skill; let’s widen our scope again and look at the potential impact of smash opportunity skill. When a player is faced with a lob, but before he decides what shot to hit, his chance of winning the point is about 72%. Thus, hitting a shot that results in winning the point is worth 0.28 points of point probability added, while a choice that ends up losing the point translates to -0.72.

There are more smash opportunities than smashes, and more room to improve on the average (72% instead of 81%), so we would expect to see a bigger range of PPA per 100 points. Put another way, we would expect that lob-responding skill, which includes smashes, is more important than smash-specific skill.

It’s a modest difference, but it does look like lob-responding skill has a bigger range than smash skill. Here is the same group of players, still showing their PPA/100 for smashes (SM PPA/100), now also including their PPA/100 for smash opportunities (SMO PPA/100):

PLAYER                 SM PPA/100  SMO PPA/100  
Jo-Wilfried Tsonga           0.17         0.18  
Roger Federer                0.10         0.16  
Pete Sampras                 0.11         0.16  
Milos Raonic                 0.04         0.12  
Tomas Berdych                0.11         0.09  
Kevin Anderson               0.01         0.08  
Novak Djokovic              -0.02         0.07  
Rafael Nadal                 0.05         0.03  
Andre Agassi                 0.00         0.01  
Stan Wawrinka               -0.07         0.00  
Kei Nishikori                0.00        -0.03  
Andy Murray                  0.01        -0.03  
Dominic Thiem               -0.07        -0.05  
David Ferrer                 0.00        -0.06  
Pablo Cuevas                -0.10        -0.12  
Juan Martin del Potro        0.02        -0.19

Djokovic and Delpo draw our attention again as the players whose smash skills do not accurately represent their smash opportunity skills. Djokovic is slightly below average with smashes, but a few notches above the norm on opportunities; Delpo is a tick above average when he hits smashes, but dreadful when dealing with lobs in general.

As it turns out, we can measure the best smashes in tennis, both to compare players and to get a general sense of the shot’s importance. What we’ve also seen is that smashes don’t tell the entire story–we learn more about a player’s overall ability when we widen our view to smash opportunities.

Smashes in the women’s game

Contemporary women hit far fewer smashes than men do, and they win points less often when they hit them. Despite the differences, the reasoning outlined above applies just as well to the WTA. Let’s take a look.

In the WTA of this decade, smashes result in winners (or induced forced errors) 63% of the time, and smashes result in points won about 75% of the time. Both numbers are lower than the equivalent ATP figures (69% and 81%, respectively), but not dramatically so. Here are the rates of winners, errors, and points won per smash for the 14 women with at least 80 smashes in the MCP dataset:

PLAYER               W/SM  E/SM  PTS/SM  
Jelena Jankovic       73%    9%     83%  
Serena Williams       72%   13%     81%  
Steffi Graf           61%    9%     81%  
Svetlana Kuznetsova   70%   10%     79%  
Simona Halep          66%   11%     76%  
Caroline Wozniacki    61%   16%     74%  
Karolina Pliskova     62%   18%     74%  
Agnieszka Radwanska   54%   13%     74%  
Angelique Kerber      57%   15%     72%  
Martina Navratilova   54%   13%     71%  
Monica Niculescu      50%   15%     70%  
Garbine Muguruza      63%   19%     70%  
Petra Kvitova         59%   22%     68%  
Roberta Vinci         58%   14%     68%

Historical shot-by-shot data is less representative for women than for men, so it’s probably safest to assume that trends in smash success rates are similar for men than for women. If that’s true, Steffi Graf’s era is similar to the present, while Martina Navratilova’s prime saw far fewer smashes going for winners or points won.

Where the women’s game really differs from the men’s is the difference between smash opportunities (lobs) and smashes. As we saw above, 72% of ATP smash opportunities result in smashes. In the current WTA, the corresponding figure is less than half that: 35%. Some of the single-player numbers are almost too extreme to be believed: In 12 matches, Catherine Bellis faced 41 lobs and hit 3 smashes; in 29 charted matches, Jelena Ostapenko saw 103 smash opportunities and tried only 10 smashes. A generation ago, the gender difference was tiny: Graf, Martina Hingis, Arantxa Sanchez Vicario, and Monica Seles all hit smashes in at least three-quarters of their opportunities. But among active players, only Barbora Strycova comes in above 70%.

Here are the smash opportunity numbers for the 17 women with at least 150 smash opportunities in the MCP dataset. SM/SMO is smashes per chance, W/SMO is winners (and induced forced errors) per smash opportunity, E/SMO is errors per opportunity, and PTS/SMO is points won per smash opportunity:

PLAYER                SM/SMO  W/SMO  E/SMO  PTW/SMO  
Maria Sharapova          12%    57%    11%      76%  
Serena Williams          55%    58%    18%      72%  
Steffi Graf              82%    52%    17%      71%  
Karolina Pliskova        47%    52%    16%      70%  
Simona Halep             14%    41%    11%      69%  
Carla Suarez Navarro     25%    33%     9%      69%  
Eugenie Bouchard         29%    50%    18%      68%  
Victoria Azarenka        35%    52%    17%      67%  
Angelique Kerber         39%    42%    14%      66%  
Garbine Muguruza         43%    51%    18%      66%  
Monica Niculescu         57%    41%    19%      65%  
Petra Kvitova            48%    50%    19%      65%  
Agnieszka Radwanska      44%    42%    18%      65%  
Johanna Konta            30%    47%    21%      64%  
Caroline Wozniacki       36%    44%    18%      64%  
Elina Svitolina          14%    38%    14%      63%  
Martina Navratilova      67%    42%    26%      58%

It’s clear from the top of this list that women’s tennis is a different ballgame. Maria Sharapova almost never opts for an overhead, but when faced with a lob, she is the best of them all. Next up is Serena Williams, who hits almost as many smashes as any active player on this list, and is nearly as successful. Recall that in the men’s game, there is a modest positive correlation between smashes per opportunity and points won per smash opportunity; here, the relationship is weaker, and slightly negative.

Because most women hit so few smashes, there isn’t quite as much to be gained by using point probability added (PPA) to measure WTA smash skill. Graf was exceptionally good, comparable to Tsonga in the value she extracted from her smash, but among active players, only Serena and Victoria Azarenka can claim a smash that is worth close to one point per thousand. At the other extreme, Monica Niculescu is nearly as bad as Graf was good, suggesting she ought to figure out a way to respond to more smash opportunities with her signature forehand slice.

Here is the same group of women (minus Navratilova, whose era makes PPA comparisons misleading), with their PPA per 100 points for smashes (SM PPA/100) and smash opportunities (SMO PPA/100):

PLAYER                SM PPA/100  SMO PPA/100  
Maria Sharapova             0.03         0.21  
Serena Williams             0.09         0.15  
Steffi Graf                 0.15         0.14  
Karolina Pliskova          -0.01         0.09  
Carla Suarez Navarro        0.04         0.08  
Simona Halep                0.00         0.07  
Eugenie Bouchard           -0.02         0.03  
Victoria Azarenka           0.08         0.00  
Angelique Kerber           -0.03        -0.02  
Garbine Muguruza           -0.07        -0.03  
Petra Kvitova              -0.07        -0.04  
Monica Niculescu           -0.13        -0.06  
Caroline Wozniacki         -0.01        -0.07  
Agnieszka Radwanska        -0.02        -0.07  
Johanna Konta              -0.12        -0.08  
Elina Svitolina             0.01        -0.09

The table is sorted by smash opportunity PPA, which tells us about a much more relevant skill in the women’s game. Sharapova’s lob-responding ability is well ahead of the pack, worth better than one point above average per 500, with Serena and Graf not far behind. The overall range among these well-studied players, from Sharapova’s 0.21 to Elina Svitolina’s -0.09, is somewhat smaller than the equivalent range in the ATP, but with such outliers as Sharapova here and Delpo on the men’s side, it’s tough to draw firm conclusions from small subsets of players, however elite they are.

Final thought

The approach I’ve outlined here to measure the impact of smash and smash-opportunity skills is one that could be applied to other shots. Smashes are a good place to start because they are so simple: Many of them end points, and even when they don’t, they often virtually guarantee that one player will win the point. While smashes are a bit more complex than they first appear, the complications involved in applying a similar algorithm to, say, backhands and backhand opportunities, are considerably greater. That said, I believe this algorithm represents a promising entry point to these more daunting problems.

Measuring the Impact of the Serve in Men’s Tennis

By just about any measure, the serve is the most important shot in tennis. In men’s professional tennis, with its powerful deliveries and short points, the serve is all the more crucial. It is the one shot guaranteed to occur in every rally, and in many points, it is the only shot.

Yet we don’t have a good way of measuring exactly how important it is. It’s easy to determine which players have the best serves–they tend to show up at the top of the leaderboards for aces and service points won–but the available statistics are very limited if we want a more precise picture. The ace stat counts only a subset of those points decided by the serve, and the tally of service points won (or 1st serve points won, or 2nd serve points won) combines the effect of the serve with all of the other shots in a player’s arsenal.

Aces are not the only points in which the serve is decisive, and some service points won are decided long after the serve ceases to have any relevance to the point. What we need is a method to estimate how much impact the serve has on points of various lengths.

It seems like a fair assumption that if a server hits a winner on his second shot, the serve itself deserves some of the credit, even if the returner got it back in play. In any particular instance, the serve might be really important–imagine Roger Federer swatting away a weak return from the service line–or downright counterproductive–think of Rafael Nadal lunging to defend against a good return and hitting a miraculous down-the-line winner. With the wide variety of paths a tennis point can follow, though, all we can do is generalize. And in the aggregate, the serve probably has a lot to do with a 3-shot rally. At the other extreme, a 25-shot rally may start with a great serve or a mediocre one, but by the time by the point is decided, the effect of the serve has been canceled out.

With data from the Match Charting Project, we can quantify the effect. Using about 1,200 tour-level men’s matches from 2000 to the present, I looked at each of the server’s shots grouped by the stage of the rally–that is, his second shot, his third shot, and so on–and calculated how frequently it ended the point. A player’s underlying skills shouldn’t change during a point–his forehand is as good at the end as it is at the beginning, unless fatigue strikes–so if the serve had no effect on the success of subsequent shots, players would end the point equally often with every shot.

Of course, the serve does have an effect, so points won by the server end much more frequently on the few shots just after the serve than they do later on. This graph illustrates how the “point ending rate” changes:

On first serve points (the blue line), if the server has a “makeable” second shot (the third shot of the rally, “3” on the horizontal axis, where “makeable” is defined as a shot that results in an unforced error or is put back in play), there is a 28.1% chance it ends the point in the server’s favor, either with a winner or by inducing an error on the next shot. On the following shot, the rate falls to 25.6%, then 21.8%, and then down into what we’ll call the “base rate” range between 18% and 20%.

The base rate tells us how often players are able to end points in their favor after the serve ceases to provide an advantage. Since the point ending rate stabilizes beginning with the fifth shot (after first serves), we can pinpoint that stage of the rally as the moment–for the average player, anyway–when the serve is no longer an advantage.

As the graph shows, second serve points (shown with a red line) are a very different story. It appears that the serve has no impact once the returner gets the ball back in play. Even that slight blip with the server’s third shot (“5” on the horizontal axis, for the rally’s fifth shot) is no higher than the point ending rate on the 15th shot of first-serve rallies. This tallies with the conclusions of some other research I did six years ago, and it has the added benefit of agreeing with common sense, since ATP servers win only about half of their second serve points.

Of course, some players get plenty of positive after-effects from their second serves: When John Isner hits a second shot on a second-serve point, he finishes the point in his favor 30% of the time, a number that falls to 22% by his fourth shot. His second serve has effects that mirror those of an average player’s first serve.

Removing unforced errors

I wanted to build this metric without resorting to the vagaries of differentiating forced and unforced errors, but it wasn’t to be. The “point-ending” rates shown above include points that ended when the server’s opponent made an unforced error. We can argue about whether, or how much, such errors should be credited to the server, but for our purposes today, the important thing is that unforced errors aren’t affected that much by the stage of the rally.

If we want to isolate the effect of the serve, then, we should remove unforced errors. When we do so, we discover an even sharper effect. The rate at which the server hits winners (or induces forced errors) depends heavily on the stage of the rally. Here’s the same graph as above, only with opponent unforced errors removed:

The two graphs look very similar. Again, the first serve loses its effect around the 9th shot in the rally, and the second serve confers no advantage on later shots in the point. The important difference to notice is the ratio between the peak winner rate and the base rate, which is now just above 10%. When we counted unforced errors, the ratio between peak and base rate was about 3:2. With unforced errors removed, the ratio is close to 2:1, suggesting that when the server hits a winner on his second shot, the serve and the winner contributed roughly equally to the outcome of the point. It seems more appropriate to skip opponent unforced errors when measuring the effect of the serve, and the resulting 2:1 ratio jibes better with my intuition.

Making a metric

Now for the fun part. To narrow our focus, let’s zero in on one particular question: What percentage of service points won can be attributed to the serve? To answer that question, I want to consider only the server’s own efforts. For unreturned serves and unforced errors, we might be tempted to give negative credit to the other player. But for today’s purposes, I want to divvy up the credit among the server’s assets–his serve and his other shots–like separating the contributions of a baseball team’s pitching from its defense.

For unreturned serves, that’s easy. 100% of the credit belongs to the serve.

For second serve points in which the return was put in play, 0% of the credit goes to the serve. As we’ve seen, for the average player, once the return comes back, the server no longer has an advantage.

For first-serve points in which the return was put in play and the server won by his fourth shot, the serve gets some credit, but not all, and the amount of credit depends on how quickly the point ended. The following table shows the exact rates at which players hit winners on each shot, in the “Winner %” column:

Server's…  Winner %  W%/Base  Shot credit  Serve credit  
2nd shot      21.2%     1.96        51.0%         49.0%  
3rd shot      18.1%     1.68        59.6%         40.4%  
4th shot      13.3%     1.23        81.0%         19.0%  
5th+          10.8%     1.00       100.0%          0.0%

Compared to a base rate of 10.8% winners per shot opportunity, we can calculate the approximate value of the serve in points that end on the server’s 2nd, 3rd, and 4th shots. The resulting numbers come out close to round figures, so because these are hardly laws of nature (and the sample of charted matches has its biases), we’ll go with round numbers. We’ll give the serve 50% of the credit when the server needed only two shots, 40% when he needed three shots, and 20% when he needed four shots. After that, the advantage conferred by the serve is usually canceled out, so in longer rallies, the serve gets 0% of the credit.

Tour averages

Finally, we can begin the answer the question, What percentage of service points won can be attributed to the serve? This, I believe, is a good proxy for the slipperier query I started with, How important is the serve?

To do that, we take the same subset of 1,200 or so charted matches, tally the number of unreturned serves and first-serve points that ended with various numbers of shots, and assign credit to the serve based on the multipliers above. Adding up all the credit due to the serve gives us a raw number of “points” that the player won thanks to his serve. When we divide that number by the actual number of service points won, we find out how much of his service success was due to the serve itself. Let’s call the resulting number Serve Impact, or SvI.

Here are the aggregates for the entire tour, as well as for each major surface:

         1st SvI  2nd SvI  Total SvI  
Overall    63.4%    31.0%      53.6%  
Hard       64.6%    31.5%      54.4%  
Clay       56.9%    27.0%      47.8%  
Grass      70.8%    37.3%      61.5%

Bottom line, it appears that just over half of service points won are attributable to the serve itself. As expected, that number is lower on clay and higher on grass.

Since about two-thirds of the points that men win come on their own serves, we can go even one step further: roughly one-third of the points won by a men’s tennis player are due to his serve.

Player by player

These are averages, and the most interesting players rarely hew to the mean. Using the 50/40/20 multipliers, Isner’s SvI is a whopping 70.8% and Diego Schwartzman‘s is a mere 37.7%. As far from the middle as those are, they understate the uniqueness of these players. I hinted above that the same multipliers are not appropriate for everyone; the average player reaps no positive after-effects of his second serve, but Isner certainly does. The standard formula we’ve used so far credits Isner with an outrageous SvI, even without giving him credit for the “second serve plus one” points he racks up.

In other words, to get player-specific results, we need player-specific multipliers. To do that, we start by finding a player-specific base rate, for which we’ll use the winner (and induced forced error) rate for all shots starting with the server’s fifth shot on first-serve points and shots starting with the server’s fourth on second-serve points. Then we check the winner rate on the server’s 2nd, 3rd, and 4th shots on first-serve points and his 2nd and 3rd shots on second-serve points, and if the rate is at least 20% higher than the base rate, we give the player’s serve the corresponding amount of credit.

Here are the resulting multipliers for a quartet of players you might find interesting, with plenty of surprises already:

                   1st serve              2nd serve       
                    2nd shot  3rd  4th     2nd shot  3rd  
Roger Federer            55%  50%  30%           0%   0%  
Rafael Nadal             31%   0%   0%           0%   0%  
John Isner               46%  41%   0%          34%   0%  
Diego Schwartzman        20%  35%   0%           0%  25%  
Average                  50%  30%  20%           0%   0%

Roger Federer gets more positive after-effects from his first serve than average, more even than Isner does. The big American is a tricky case, both because so few of his serves come back and because he is so aggressive at all times, meaning that his base winner rate is very high. At the other extreme, Schwartzman and Rafael Nadal get very little follow-on benefit from their serves. Schwartzman’s multipliers are particularly intriguing, since on both first and second serves, his winner rate on his third shot is higher than on his second shot. Serve plus two, anyone?

Using player-specific multipliers makes Isner’s and Schwartzman’s SvI numbers more extreme. Isner’s ticks up a bit to 72.4% (just behind Ivo Karlovic), while Schwartzman’s drops to 35.0%, the lowest of anyone I’ve looked at. I’ve calculated multipliers and SvI for all 33 players with at least 1,000 tour-level service points in the Match Charting Project database:

Player                 1st SvI  2nd SvI  Total SvI  
Ivo Karlovic             79.2%    56.1%      73.3%  
John Isner               78.3%    54.3%      72.4%  
Andy Roddick             77.8%    51.0%      71.1%  
Feliciano Lopez          83.3%    37.1%      68.9%  
Kevin Anderson           77.7%    42.5%      68.4%  
Milos Raonic             77.4%    36.0%      66.0%  
Marin Cilic              77.1%    34.1%      63.3%  
Nick Kyrgios             70.6%    41.0%      62.5%  
Alexandr Dolgopolov      74.0%    37.8%      61.3%  
Gael Monfils             69.8%    37.7%      60.8%  
Roger Federer            70.6%    32.0%      58.8%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Bernard Tomic            67.6%    28.7%      58.5%  
Tomas Berdych            71.6%    27.0%      57.2%  
Alexander Zverev         65.4%    30.2%      54.9%  
Fernando Verdasco        61.6%    32.9%      54.3%  
Stan Wawrinka            65.4%    33.7%      54.2%  
Lleyton Hewitt           66.7%    32.1%      53.4%  
Juan Martin Del Potro    63.1%    28.2%      53.4%  
Grigor Dimitrov          62.9%    28.6%      53.3%  
Jo Wilfried Tsonga       65.3%    25.9%      52.7%  
Marat Safin              68.4%    22.7%      52.3%  
Andy Murray              63.4%    27.5%      52.0%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Dominic Thiem            60.6%    28.9%      50.8%  
Roberto Bautista Agut    55.9%    32.5%      49.5%  
Pablo Cuevas             57.9%    28.9%      47.8%  
Richard Gasquet          56.0%    29.0%      47.5%  
Novak Djokovic           56.0%    26.8%      47.3%  
Andre Agassi             54.3%    31.4%      47.1%  
Gilles Simon             55.7%    28.4%      46.7%  
Kei Nishikori            52.2%    30.8%      45.2%  
David Ferrer             46.9%    28.2%      41.0%  
Rafael Nadal             42.8%    27.1%      38.8%  
Diego Schwartzman        39.5%    25.8%      35.0%

At the risk of belaboring the point, this table shows just how massive the difference is between the biggest servers and their opposites. Karlovic’s serve accounts for nearly three-quarters of his success on service points, while Schwartzman’s can be credited with barely one-third. Even those numbers don’t tell the whole story: Because Ivo’s game relies so much more on service games than Diego’s does, it means that 54% of Karlovic’s total points won–serve and return–are due to his serve, while only 20% of Schwartzman’s are.

We didn’t need a lengthy analysis to show us that the serve is important in men’s tennis, or that it represents a much bigger chunk of some players’ success than others. But now, instead of asserting a vague truism–the serve is a big deal–we can begin to understand just how much it influences results, and how much weak-serving players need to compensate just to stay even with their more powerful peers.

The Negative Impact of Time of Court

Italian translation at settesei.it

With 96 men’s matches in the books so far at Roland Garros this year, we’ve seen only one go to the absolute limit, past 6-6 in the fifth set. Still, we’ve had our share of lengthy, brutal five-set fights, including three matches in the first round that exceeded the four-hour mark. The three winners of those battles–Victor Estrella, David Ferrer, and Rogerio Dutra Silva–all fell to their second-round opponent.

A few years ago, I identified a “hangover effect” after Grand Slam marathons, defined as those matches that reach 6-6 in the fifth. Players who emerge victorious from such lengthy struggles would often already be considered underdogs in their next matches–after all, elite players rarely need to work so hard to advance–but marathon winners underperform even when we take their underdog status into account. (Earlier this week, I showed that women suffer little or no hangover effect after marathon third sets.)

A number of readers suggested I take a broader look at the effect of match length. After all, there are plenty of slugfests that fall just short of the marathon threshold, and some of those, like Ferrer’s loss yesterday to Feliciano Lopez, 6-4 in the final set, are more physically testing than some of those that reach 6-6. Match time still isn’t a perfect metric for potential fatigue–a four-hour match against Ferrer is qualitatively different from four hours on court with Ivo Karlovic–but it’s the best proxy we have for a very large sample of matches.

What happens next?

I took over 7,200 completed men’s singles matches from Grand Slams back to 2001 and separated them into groups by match time: one hour to 1:29, 1:30 to 2:00, and so on, up to a final category of 4:30 and above. Then I looked at how the winners of all those matches fared against their next opponents:

Prev Length   Matches  Wins  Win %  
1:00 to 1:29      448   275  61.4%  
1:30 to 1:59     1918  1107  57.7%  
2:00 to 2:29     1734   875  50.5%  
2:30 to 2:59     1384   632  45.7%  
3:00 to 3:29      976   430  44.1%  
3:30 to 3:59      539   232  43.0%  
4:00 to 4:29      188    64  34.0%  
4:30 and up        72    23  31.9%

The trend couldn’t be any clearer. If the only thing you know about a Slam matchup is how long the players spent on court in their previous match, you’d bet on the guy who recorded his last win in the shortest amount of time.

Of course, we know a lot more about the players than that. Andy Murray spent 3:34 on court yesterday, but even with his clay-court struggles this year, we would favor him in the third round against most of the men in the draw. As I’ve done in previous studies, let’s account for overall player skill by estimating the probability of each player winning each of these 7,200+ matches. Here are the same match-length categories, with “expected wins” (based on surface-specific Elo, or sElo) shown as well:

Prev Length   Wins  Exp Wins  Exp Win %  Ratio  
1:00 to 1:29   275       258      57.5%   1.07  
1:30 to 1:59  1107      1058      55.2%   1.05  
2:00 to 2:29   875       881      50.8%   0.99  
2:30 to 2:59   632       657      47.5%   0.96  
3:00 to 3:29   430       445      45.6%   0.97  
3:30 to 3:59   232       244      45.3%   0.95  
4:00 to 4:29    64        77      41.2%   0.83  
4:30 and up     23        30      42.1%   0.76

Again, there’s not much ambiguity in the trend here. Better players spend less time on court, so if you know someone beat their previous opponent in 1:14, you can infer that he’s a very good player. Often that assumption is wrong, but in the aggregate, it holds up.

The “Ratio” column shows the relationship between actual winning percentage (from the first table) and expected winning percentage. If previous match time had no effect, we’d expect to see ratios randomly hovering around 1. Instead, we see a steady decline from 1.07 at the top–meaning that players coming off of short matches win 7% more often than their skill level would otherwise lead us to forecast–to 0.76 at the bottom, indicating that competitors tend to underperform following a battle of 4:30 or longer.

It’s difficult to know whether we’re seeing a direct effect of time of court or a proxy for form. As good as surface-specific Elo ratings are, they don’t capture everything that could possibly predict the outcome of a match, especially micro-level considerations like a player’s comfort on a specific type of surface or at a certain tournament. sElo also needs a little time to catch up with players making fast improvements, particularly when they are very young. All this is to say that our correction for overall skill level will never be perfect.

Thus, a 75-minute win may improve a player’s chances by keeping him fresh for the next round … or it might tell us that–for whatever reason–he’s a stronger competitor right now than our model gives him credit for. One point in favor of the latter is that, at the most extreme, less time on court doesn’t help: Players don’t appear to benefit from advancing via walkover. That isn’t a slam-dunk argument–some commentators believe that walkovers could be detrimental due to the long resulting layoff at a Slam–but it does show us that less time on court isn’t always a positive.

Whatever the underlying cause, we can tweak our projections accordingly. Murray could be a little weaker than usual tomorrow after his length battle yesterday with Martin Klizan. Albert Ramos, the only man to complete a second-rounder in less than 90 minutes, might be playing a bit better than his rating suggest. It’s certainly evident that match time has something to tell us even when players aren’t stretched to the breaking point of a marathon fifth set.

Angelique Kerber’s Unclutch Unforced Errors

Italian translation at settesei.it

It’s been a rough year for Angelique Kerber. Despite her No. 1 WTA ranking and place at the top of the French Open draw, she lost her opening match on Sunday against the unseeded Ekaterina Makarova. Adding insult to injury, the loss goes down in the record books as a lopsided-looking 6-2 6-2.

Andrea Petkovic chimed in with her diagnosis of Kerber’s woes:

She’s simply playing without confidence right now. It was tight, even though the scoreline was 2 and 2 but everyone who knows a thing about tennis knew that Angie made errors whenever it mattered because she’s playing without any confidence right now – errors she didn’t make last year.

This is one version of a common analysis: A player lost because she crumbled on the big points. While that probably doesn’t cover all of Kerber’s issues on Sunday–Makarova won 72 points to her 55–it is true that big points have a disproportionate effect on the end result. For every player who squanders a dozen break points yet still wins the match, there are others who falter at crucial moments and ultimately lose.

This family of theories–that a player over- or under-performed at big moments–is testable. For instance, I showed last summer that Roger Federer’s Wimbledon loss to Milos Raonic was due in part to his weaker performance on more important points. We can do the same with Kerber’s early exit.

Here’s how it works. Once we calculate each player’s probability of winning the match before each point, we can assign each point a measure of importance–I prefer to call it leverage, or LEV–that quantifies how much the single point could effect the outcome of the match. At 3-0, 40-0, it’s almost zero. At 3-3, 40-AD in the deciding set, it might be over 10%. Across an entire tournament’s worth of matches, the average LEV is around 5% to 6%.

If Petko is right, we’ll find that the average LEV of Kerber’s unforced errors was higher than on other points. (I’ve excluded points that ended with the serve, since neither player had a chance to commit an unforced error.) Sure enough, Kerber’s 13 groundstroke UEs (that is, excluding double faults) had an average LEV of 5.5%, compared to 3.8% on points that ended some other way. Her UE points were 45% more important than non-UE points.

Let’s put that number in perspective. Among the 86 women for whom I have point-by-point UE data for their first-round matches this week*, ten timed their errors even worse than Kerber did. Magdalena Rybarikova was the most extreme: Her eight UEs against Coco Vandeweghe were more than twice as important, on average, as the rest of the points in that match. Seven of the ten women with bad timing lost their matches, and two others–Agnieszka Radwanska and Marketa Vondrousova–committed so few errors (3 and 4, respectively), that it didn’t really matter. Only Dominika Cibulkova, whose 15 errors were about as badly timed as Kerber’s, suffered from unclutch UEs yet managed to advance.

* This data comes from the Roland Garros website. I aggregate it after each major and make it available here.

Another important reference point: Unforced errors are evenly distributed across all leverage levels. Our instincts might tell us otherwise–we might disproportionately recall UEs that came under pressure—-but the numbers don’t bear it out. Thus, Kerber’s badly timed errors are just as badly timed when we compare her to tour average.

They are also poorly timed when compared to her other recent performances at majors. Petkovic implied as much when she said her compatriot was making “errors she didn’t make last year.” Across her 19 matches at the previous four Slams, her UEs occurred on points that were 11% less important than non-UE points. Her errors caused her to lose relatively more important points in only 5 of the 19 matches, and even in those matches, the ratio of UE leverage to non-UE leverage never exceeded 31%, her ratio in Melbourne this year against Tsurenko. That’s still better than her performance on Sunday.

Across so many matches, a difference of 11% is substantial. Of the 30 players with point-by-point UE data for at least eight matches at the previous four majors, only three did a better job timing their unforced errors. Radwanska heads the list, at 16%, followed by Timea Bacsinszky at 14% and Kiki Bertens at 12%. The other 26 players committed their unforced errors at more important moments than Kerber did.

As is so often the case in tennis, it’s difficult to establish if a stat like this is indicative of a longer-trend trend, or if it is mostly noise. We don’t have point-by-point data for most of Kerber’s matches, so we can’t take the obvious next step of checking the rest of her 2017 matches for similarly unclutch performances. Instead, we’ll have to keep tabs on how well she limits UEs at big moments on those occasions where we have the data necessary to do so.