In the last couple of years, I’ve gotten a lot of mileage out of a metric called Aggression Score (AS), first outlined here by Lowell West. The stat is so useful due to its simplicity. The more aggressive a player is, the more she’ll rack up both winners and unforced errors. AS, then, is essentially the rate at which a player hits winners and unforced errors.
Yet one limitation lies in Aggression Score’s simplicity. It works best when winners and unforced errors move together, and when they are roughly similar. If someone is having a really bad day, her unforced errors might skyrocket, resulting in a higher AS, even if the root cause of the errors is poor play, not aggression. On the flip side, a locked-in player will see her AS increase by hitting more winners, even if those winners are more a reflection of good form than a high-risk tactic.
I’ve long wanted to extend the idea behind Aggression Score to return tactics, but when we narrow our view to the second shot of the rally, the simplicity of the metric becomes a handicap. On the return, the vast majority of “aggressive” shots are errors, so the results will be swamped by error rate, minimizing the role of return winners, which are a more reliable indicator. Using Match Charting Project data from 2010-present women’s tennis, returns result in errors 18% of the time, while they turn into winners (or they induce forced errors) less than one-third as often, 5.5% of the time. The appealingly simple Aggression Score formula, narrowed to consider only returns of serve, won’t do the job here.
Return aggression score
Let’s walk through a formula to measure return aggression, using last month’s Miami final between Sloane Stephens and Jelena Ostapenko as an example. Tallying up return points (excluding aces and service winners), along with return errors* and return winners** for both players from the match chart, we get the following:
Returner RetPts RetErr RetWin RetE% RetW% Sloane Stephens 64 9 1 14.1% 1.6% Jelena Ostapenko 63 11 6 17.5% 9.5%
* “errors” are a combination of forced and unforced, because most return errors are scored as forced errors, and because the distinction between the two is so unreliable as to be meaningless. Some forced error returns are nearly impossible to make, so they don’t really belong in this analysis, but with the state of available data, it’ll have to do.
** throughout this post, I’ll use “winners” as short-hand for “winners plus induced forced errors” — that is, shots that were good enough to end the point.
These numbers make clear which of the two players is the aggressive one, and they confirm the obvious: Ostapenko plays much higher-risk tennis than Stephens does. In this case, Ostapenko’s rates are nearly equal to or above the tour averages of 17.8% and 5.5%, while both of Stephens’s are well below them.
The next step is to normalize the error and winner rates so that we can more easily see how they relate to each other. To do that, I simply divide each number by the tour average:
Returner RetE% RetW% RetE+ RetW+ Sloane Stephens 14.1% 1.6% 0.79 0.28 Jelena Ostapenko 17.5% 9.5% 0.98 1.73
The last two columns show the normalized figures, which reflect how each rate compares to tour average, where 1.0 is average, greater than 1 means more aggressive, and less than 1 means less aggressive.
We’re not quite done yet, because, as Ostapenko and Stephens illustrate, return winner rates are much noisier than return error rates. That’s largely a function of how few there are. The gap between the two players’ normalized rates, 0.28 and 1.73, looks huge, but represents a difference of only five winners. If we leave return winner rates untouched, we’ll end up with a metric that varies largely due to movement in winner rates–the opposite problem from where we started.
To put winners and errors on a more equal footing, we can express both in terms of standard deviations. The standard deviation of the adjusted error ratio is 0.404, while the standard deviation of the adjusted winner ratio is 0.768, so when we divide the ratios by the standard deviations, we’re essentially reducing the variance in the winner number by half. The resulting numbers tell us how many standard deviations a certain statistic is above or below the mean, and these final results give us winner and error rates that are finally comparable to each other:
Returner RetE+ RetW+ RetE-SD RetW-SD Sloane Stephens 0.79 0.28 -0.52 -0.93 Jelena Ostapenko 0.98 1.73 -0.05 0.95
(Math-oriented readers might notice that the last two steps don’t need to be separate; we could just as easily think of these last two numbers as standard deviations above or below the mean of the original winner and error rates. I included the intermediate step to–I hope–make the process a bit more intuitive.)
Our final stat, Return Aggression Score (RAS) is simply the average of those two rates measured in standard deviations:
Returner RetE-SD RetW-SD RAS Sloane Stephens -0.52 -0.93 -0.73 Jelena Ostapenko -0.05 0.95 0.45
Positive numbers represent more aggression than tour average; negative numbers less aggression. Ostapenko’s +0.45 figure is higher than about 75% of player-matches among the nearly 4,000 in the Match Charting Project dataset, though as we’ll see, it is far more conservative than her typical strategy. Stephens’s -0.73 mark is at the opposite position on the spectrum, higher than only one-quarter of player-matches. It is also lower than her own average, though it is higher than the -0.97 RAS she posted in the US Open final last fall.
The extremes
The first test of any new metric is whether the results actually make sense, and we need look no further than the top ten most aggressive player-matches for confirmation. Five of the top ten most aggressive single-match return performances belong to Serena Williams, and the overall most aggressive match is Serena’s 2013 Roland Garros semifinal against Sara Errani, which rates at 3.63–well over three standard deviations above the mean. The other players represented in the top ten are Ostapenko, Oceane Dodin, Petra Kvitova, Madison Keys, and Julia Goerges–a who’s who of high-risk returning in women’s tennis.
The opposite end of the spectrum includes another group of predictable names, such as Simona Halep, Agnieszka Radwanska, Caroline Wozniacki, Annika Beck, and Errani. Two of Halep’s early matches are lowest and third-lowest, including the 2012 Brussels final against Radwanska, in which her return aggression was 1.6 standard deviations below the mean. It’s not as extreme a mark as Serena’s performances, but that’s the nature of the metric: Halep returned 46 of 48 non-ace serves, and none of the 46 returns went for winners. It’s tough to be less aggressive than that.
The leaderboard
The Match Charting Project has shot-by-shot data on at least ten matches each for over 100 WTA players. Of those, here are the top ten, as ranked by RAS:
Player Matches RetPts RAS Oceane Dodin 11 665 1.18 Aryna Sabalenka 11 816 1.12 Camila Giorgi 19 1155 1.07 Mirjana Lucic 11 707 1.05 Julia Goerges 27 1715 0.94 Petra Kvitova 65 4142 0.90 Serena Williams 91 5593 0.90 Jelena Ostapenko 35 2522 0.88 Anastasia Pavlyuchenkova 21 1180 0.78 Lucie Safarova 34 2294 0.77
We’ve already seen some of these names, in our discussion of the highest single-match marks. When we average across contests, a few more players turn up with RAS marks over one full standard deviation above the mean: Aryna Sabalenka, Camila Giorgi, and Mirjana Lucic-Baroni.
Again, the more conservative players don’t look as extreme: Only Madison Brengle has a RAS more than one standard deviation below the mean. I’ve included the top 20 on this list because so many notable names (Wozniacki, Radwanska, Kerber) are between 11 and 20:
Player Matches RetPts RAS Madison Brengle 11 702 -1.06 Monica Niculescu 32 2099 -0.93 Stefanie Voegele 12 855 -0.85 Annika Beck 16 1181 -0.78 Lara Arruabarrena 10 627 -0.72 Johanna Larsson 14 873 -0.65 Barbora Strycova 20 1275 -0.63 Sara Errani 25 1546 -0.60 Carla Suarez Navarro 36 2585 -0.55 Svetlana Kuznetsova 27 2271 -0.55 Player Matches RetPts RAS Viktorija Golubic 16 1272 -0.53 Agnieszka Radwanska 96 6239 -0.51 Yulia Putintseva 22 1552 -0.51 Caroline Wozniacki 80 5165 -0.50 Christina McHale 11 763 -0.48 Angelique Kerber 93 6611 -0.46 Louisa Chirico 13 806 -0.44 Darya Kasatkina 26 1586 -0.43 Magdalena Rybarikova 12 725 -0.41 Anastasija Sevastova 30 1952 -0.40
A few more notable names: Halep, Stephens and Elina Svitolina all count among the next ten lowest, with RAS figures between -0.30 and -0.36. The most “average” player among game’s best is Victoria Azarenka, who rates at -0.08. Venus Williams, Johanna Konta, and Garbine Muguruza make up a notable group of aggressive-but-not-really-aggressive women between +0.15 and +0.20, just outside of the game’s top third, while Maria Sharapova, at +0.63, misses our first list by only a few places.
Unsurprisingly, these results track quite closely to overall Aggression Score figures, as players who adopt a high-risk strategy overall are probably doing the same when facing the serve. This metric, however, allows to identify players–or even single matches–for which the two strategies don’t move in concert. Further, the approach I’ve taken here, to separate and normalize winners and errors, rather than treat them as an undifferentiated mass, could be applied to Aggression Score itself, or to other more targeted versions of the metric, such as a third-shot AS, or a backhand-specific AS.
As always, the more data we have, the more we can learn from it. Analyses like these are only possible with the work of the volunteers who have contributed to the Match Charting Project. Please help us continue to expand our coverage and give analysts the opportunity to look at shot-by-shot data, instead of just the basics published by tennis’s official federations.