Shot-by-shot stats – Heavy Topspin

WTA Decisions From the Backhand Corner

Earlier this week I presented a lot of data about what happens when men face a makeable ball hit to their backhand corner. That post was itself a follow-up on a previous look at what happened when players of both genders attempted down-the-line backhands. You don’t need to read those two articles to know what’s going on in this one, but if you’re interested in the topic, you’ll probably find them worthwhile.

Decision-making in the backhand corner is one of the biggest differences between pro men and women. Let me illustrate in the nerdiest way possible, with bug reports from the code I wrote to assemble these numbers. My first stab at the code to aggregate player-by-player numbers for men failed because some men never hit a topspin backhand from the backhand corner. At least, not in any match recorded by the Match Charting Project. The offending player who generated those divide-by-zero errors was Sam Groth. In his handful of charted matches, he relied entirely on the slice, at least in those rare cases where rallies extended beyond the return of serve.

Compare with the bug that slowed me down in preparing this post. The problematic player this time was Evgeniya Rodina. In nine charted matches, she has yet to hit a forehand from the backhand corner. If your backhand is the better shot, why would you run around it? Of the nearly 200 players with five charted matches from the 2010s, Rodina is the only one with zero forehands. But she isn’t really an outlier. 23 other women hit fewer than 10 forehands in all of their charted matches, including Timea Bacsinszky, who opted for the forehand only four times in 32 matches.

Faced with a makeable ball in the backhand corner, men and women both hit a non-slice groundstroke about four-fifths of the time. But of those topspin and flat strokes, women stick with the backhand 94% of the time, compared to 82% for men.

A few WTA players seek out opportunities to run around their backhands, including Sam Stosur and Polona Hercog, both of whom hit the forehand 20% of the time they are pushed into the backhand corner. Ashleigh Barty also displays more Federer-like tactics than most of her peers, using the forehand 13% of the time. Yet most of the women with powerful forehands, like Serena Williams, have equal or better backhands, making it counter-productive to run around the shot. Serena hits a forehand only 1% of the time her opponent sends a makeable ball into her backhand corner.

Directional decisions

Backhand or forehand, let’s start by looking at which specific shot that players chose. The Match Charting Project contains shot-by-shot logs of about 2,900 women’s matches from the 2010s, including 365,000 makeable balls hit to one player’s backhand corner. (“Makeable” is defined as a ball that either came back or resulted in an unforced error.)

Here is the frequency with which players hit backhand and forehands in different directions from their backhand corner. I’ve included the ATP numbers for comparison:

BH Direction               WTA Freq  ATP Freq  
Down the line                 17.4%     17.4%  
Down the middle               35.2%     29.5%  
Cross-court                   47.3%     52.9%  
                                               
FH Direction               WTA Freq  ATP Freq  
Down the line (inside-in)     35.2%     35.1%  
Down the middle               16.2%     12.8%  
Cross-court (inside-out)      48.4%     51.8%

Once a forehand or backhand is chosen, there isn’t much difference between men and women. Women go up the middle a bit more often, which may partly be a function of using the topspin or flat backhand in defensive positions slightly more than men do. I’ve also observed that today’s top women are more likely to hit an aggressive shot down the middle than men are. The level of aggression and risk may be similar to that of a bullet aimed at a corner, but when we classify by direction, it looks a bit more conservative. That’s just a theory, however, so we’ll have to test that another day.

Point probability

Things get more interesting when we look at how these choices affect the likelihood of winning the point. On average, a woman faced with a makeable ball in her backhand corner has a 47.2% chance of winning the point. (For men, it’s 47.7%.) The serve has some effect on the potency those shots toward the backhand corner. If the makeable ball was a service return–presumably weaker than the average groundstroke–the probability of winning the point is 48.2%. If the makeable ball is one shot later, an often-aggressive “serve-plus-one” shot, the chances of fighting back and winning the point are only 46.3%. It’s not a huge difference, but it is a reminder that the context of any given shot can affect these probabilities.

The various decisions available to players each have their own effect on the probability of winning the point, at least on average. If a woman chooses to hit a down-the-line backhand, her likelihood of winning the point increases to 53.0%. If she makes that shot, her odds rise to 68.4%.

The following table shows those probabilities for every decision. The first column of percentages, “Post-Shot,” indicates the likelihood of winning after making the decision–the 53.0% I just mentioned. The second column, “In-Play,” is the chance of winning if she makes that shot, like 68.4% for the down-the-line backhand.

Shot      Direction  Post-Shot  In-Play  
Backhand  (all)          48.5%    55.2%  
Backhand  DTL            53.0%    68.4%  
Backhand  Middle         44.6%    48.8%  
Backhand  XC             49.9%    55.8%  
                                         
Forehand  (all)          56.3%    56.1%  
Forehand  DTL (I-I)      61.4%    73.7%  
Forehand  Middle         45.7%    50.3%  
Forehand  XC (I-O)       56.2%    64.4%

The down-the-line shots are risky, so the gap between the two probabilities is a big one. There is little difference between Post-Shot and In-Play for down-the-middle shots, because they almost always go in. For the forehand probabilities, keep in mind that they are skewed by the selection of players who choose to use their forehands more often. Your mileage may vary, especially if you play like Rodina does.

Cautious recommendations

Looking at this table, you might wonder why a player would ever make certain shot selections. The likelihood of winning the point before choosing a wing or direction is 47.2%, so why go with a backhand down the middle (44.6%) when you could hit an inside-in forehand (61.4%)? It’s not the risk of missing, because that’s baked into the numbers.

One obvious reason is that it isn’t always possible to hit the most rewarding shot. Even the most aggressive men run around only about one-quarter of their backhands, suggesting that it would be impractical to hit a forehand on the remaining three-quarters of opportunities. That wipes out half of the choices I’ve listed. And even a backhand wizard such as Simona Halep can’t hit lasers down the line at will. The probabilities reflect what happened when players thought the shot was the best option available to them. Even though were occasionally wrong, this is very, very far from a randomized controlled trial in which a scientist told players to hit a down-the-line backhand no matter what the nature of the incoming shot.

Another complication is one that I’ve already mentioned: The success rates for rarer shots, like inside-in forehands, reflect how things turned out for players who chose to hit them. That is, for players who consider them to be weapons. It might be amusing to watch Monica Niculescu hit inside-out topspin forehands at every opportunity, but it almost certainly wouldn’t improve her chances of winning. You only get those rosy forehand numbers if you can hit a forehand like Stosur does.

That said, the table does drive home the point that conservative shot selection has an effect on the probability of winning points. Some women are happy sending backhand after backhand up the middle of the court, and sometimes that’s all you can do. But when more options are available, the riskier choices can be more rewarding.

Player probabilities

Let’s wrap up for today by taking a player-by-player look at these numbers. We established that the average player has a 47.2% chance of winning the point when a makeable shot is arcing toward her backhand corner. Even though Tsvetana Pironkova’s number is also 47.2%, no player is average. Here are the top 14 players–minimum ten charted matches, ranked by the probability of winning a point from that position. I’ve also included the frequency with which they hit non-slice backhands:

Player                     Post-Shot  BH Freq  
Kim Clijsters                  53.4%    77.6%  
Na Li                          53.2%    87.5%  
Camila Giorgi                  52.9%    93.8%  
Patricia Maria Tig             52.1%    66.1%  
Simona Halep                   52.1%    83.6%  
Belinda Bencic                 51.5%    91.7%  
Dominika Cibulkova             51.3%    70.1%  
Veronika Kudermetova           50.9%    73.9%  
Jessica Pegula                 50.7%    73.7%  
Su-Wei Hsieh                   50.6%    81.8%  
Dayana Yastremska              50.6%    87.6%  
Anna Karolina Schmiedlova      50.3%    87.4%  
Serena Williams                49.9%    89.2%  
Sara Errani                    49.8%    70.0%

These numbers are from the 2010s only, so they don’t encompass the entire careers of the top two players on the list, Kim Clijsters and Li Na. It is particularly impressive that they make the cut, because their charted matches are not a random sample–they heavily tilt toward high-profile clashes against top opponents. The remainder of the list is a mixed bag of elites and journeywomen, backhand bashers and crafty strategists.

Next are the players with the best chances of winning the point after hitting a forehand from the backhand corner. I’ve drawn the line at 100 charted forehands, a minimum that limits our pool to about 50 players:

Player                Post-Shot  FH Freq  
Maria Sharapova           69.0%     4.1%  
Dominika Cibulkova        65.1%    10.5%  
Ana Ivanovic              64.7%    11.1%  
Yafan Wang                64.4%     8.8%  
Rebecca Peterson          63.4%    15.2%  
Simona Halep              63.1%     6.8%  
Carla Suarez Navarro      63.0%     7.7%  
Andrea Petkovic           62.3%     5.3%  
Christina McHale          61.9%    15.2%  
Anastasija Sevastova      61.3%     4.2%  
Petra Kvitova             60.8%     4.6%  
Caroline Garcia           60.7%     7.5%  
Misaki Doi                60.5%    17.0%  
Madison Keys              59.3%     9.3%  
Elina Svitolina           59.1%     3.9%

Maria Sharapova is the Gilles Simon of the WTA. (Now there’s a sentence I never thought I’d write!) Both players usually opt for the backhand, but are extremely effective when they go for the forehand. Kudos to Sharapova for her well-judged attacks, though it could be that she’s leaving some points on the table by not running around her backhand more often.

Next

As I wrote on Thursday, we’re still just scratching the surface of what can be done with Match Charting Project data to analyze tactics such as this one. A particular area of interest is to break down backhand-corner opportunities (or chances anywhere on the court) even further. The average point probability of 47.2% surely does not hold if we look at makeable balls that started life as, say, inside-out forehands. If some players are facing more tough chances, we should view those numbers differently.

If you’ve gotten this far, you must be interested. The Match Charting Project has accumulated shot-by-shot logs of nearly 7,000 matches. It’s a huge number, but we could always use more. Many up and coming players have only a few matches charted, and many interesting matches of the past (like most of those played by Li and Clijsters!) remain unlogged. You can help, and if you like watching and analyzing tennis, you should.

Weighing Options From the Backhand Corner

A few weeks ago, I offered a “first look” at the down-the-line backhand. I offered a stack of Match Charting Project-based stats showing how often players opted to play that shot, what happened when they did, how lefties differ from righties, and which players stood out thanks to the frequency or success of their down-the-line strikes.

Like Richard Gasquet returning a serve, we need to take a step back before we can move forward. Rather than continuing to focus solely on the down-the-line backhand, let’s expand our view to all shots played from the backhand corner. The DTL backhand is only one choice among many. A player in position to go down the line has the option of a cross-court shot or a more conservative reply up the middle. She also might run around the backhand entirely, taking aim with a forehand up the line (“inside-in”), down the middle, or cross-court (“inside-out”).

Every shot is a choice, and one of the roles of analytics is to analyze the pros and cons of decisions players make. Ideally, we would even be able to identify cases in which pros make poor choices and recommend better ones. We’re still many steps away from that, at least in any kind of systematic way. But thanks to the thousands of matches with shot-by-shot data logged by the Match Charting Project, we have plenty of raw material to help us get closer.

The first choice

In 2,700 charted men’s matches from the last decade (happy new year!), I isolated about 450,000 situations in which one player had a makeable ball in his backhand corner, excluding service returns. The definition of “makeable” is inherently a bit messy. For today’s purposes, a makeable ball is one that the player managed to return or one that turned into an unforced error. With ball-tracking data, we could be more precise, but for now we need to accept this level of imprecision.

Of the 450,000 makeable backhand-corner balls, players hit (non-slice) backhands 63.7% of the time and (non-slice) forehands 14.3% of the time. The remaining 22% were divvied up among slices, dropshots, and lobs, and we’ll set those aside for another day.

Here’s how 2010s men chose to aim their backhands from the backhand corner:

Down the line: 17.4%
Down the middle: 29.5%
Cross-court: 52.9%

And their forehands from the same position:

Down the line (inside-in): 35.1%
Down the middle: 12.8%
Cross-court (inside-out): 51.8%

The inside-in percentage is a bit surprising at first, though we need to keep in mind that it’s 35% of a relatively small number, accounting for only 5% of total shots from the backhand corner. Less surprising is the much higher frequency of shots going cross-court. Not only is that a safer, higher-percentage play, it directs the ball to the opponent’s backhand (unless he’s a lefty), which is typically his weaker side.

Point probability

Shot selection is only a means to an end. More important than deploying textbook-perfect strategy is winning the point, and that’s where we’ll turn next.

The average ATPer has a 47.7% chance of winning the point when faced with a makeable ball in his backhand corner. Of course, any particular opportunity could be much better or worse than that. But again, without camera-based ball-tracking data, we can’t make more accurate estimates for specific chances. We can get some clues as to the range of probabilities by looking at how they vary at different stages of the rally. When a player has an opportunity for a “serve-plus-one” shot in the backhand corner–the third shot of the rally–his chances of winning the point are higher, at 51.1%. On the fourth shot of the rally, when pros are often still recovering from the disadvantage of returning, the chances of winning the point from that position are 45.4%. Context matters, in large part because context offers hints as to whether certain shots are better or worse than average.

So far, we have an idea of the probability of winning the point before making a choice. There are two ways of looking at the probability after choosing and hitting a shot: the odds of winning the point after hitting the shot, and the odds of winning the point after making the shot. The second number is obviously going to be better, because we simply filter out the errors. By excluding what could go wrong, it doesn’t give us the whole picture, but it does provide some useful information, showing which shots have the capacity to put opponents in the worst positions.

Here are the point probabilities for each of the shots we’re considering. For each choice, I’ve shown the probability of winning the point after hitting the shot (“Post-Shot”) and after making the shot (“In-Play”).

Shot      Direction  Post-Shot  In-Play  
Backhand  (all)          48.2%    54.2%  
Backhand  DTL            51.4%    64.6%  
Backhand  Middle         44.2%    48.2%  
Backhand  XC             49.5%    54.6%  
                                         
Forehand  (all)          55.1%    63.0%  
Forehand  DTL (I-I)      58.5%    69.0%  
Forehand  Middle         47.3%    52.0%  
Forehand  XC (I-O)       54.9%    61.9%

Forehands tend to do more to improve point-winning probability than backhands, though the down-the-middle forehand is less effective than a backhand to either corner. Again, this is context talking: A player who runs around a backhand just to hit a conservative forehand may have misjudged the angle or spin of the ball and felt forced to make a more defensive play. Still, it’s a relatively common tactic on slower clay courts (on clay, it is almost twice as common than tour average), and it may be used too often.

The most dramatic differences between the two probabilities are on the down-the-line shots. Both forehand and backhand are aggressive, high-risk shots, something reflected in the winner and unforced error rates for each. 9% of all shots from the backhand corner are winners, and another 11% are unforced errors. Of down-the-line shots, 23% are winners and 19% are unforced errors. While the choice to go down the line isn’t superior to other options, both the forehand and backhand are devastating shots when they work.

Player by player

Let’s tentatively measure “effectiveness” in terms of increasing point probability. Setting aside the complexity of context, which won’t be the same for every player, the most effective pro is the one who makes the most of a certain class of opportunities.

Here are the 10 best active players (of those with at least 20 charted matches) who do the most when faced with a makeable ball in their own backhand corner. Keep in mind that the average player has a 47.7% chance of winning the point from that position:

Player                Post-Shot  
Rafael Nadal              52.9%  
Diego Schwartzman         52.4%  
Novak Djokovic            52.3%  
Nikoloz Basilashvili      51.9%  
Andrey Rublev             51.8%  
Kei Nishikori             51.5%  
Gilles Simon              51.2%  
Pablo Cuevas              50.9%  
Alex De Minaur            50.0%  
Pablo Carreno Busta       49.6%

The Match Charting Project data might understate just how effective Rafael Nadal, Novak Djokovic, and Kei Nishikori are from their backhand corner, since a disproportionate number of their charted matches are against other top players. In any case, it is no surprise to see them here, along with such backhand warriors as Diego Schwartzman and Gilles Simon.

This list is limited to the tour regulars with at least 20 matches charted. One more name to watch out for is Thomas Fabbiano, with only 12 matches logged so far. In that limited sample, his point probability from the backhand corner is a whopping 59.2%. He isn’t quite that much of an outlier in reality, since his charted matches include contests against Ivo Karlovic, Reilly Opelka, and Sam Querrey, opponents whose ground games leave a bit to be desired. But his overall figure is so far off the charts that, even adjusting downward by a hefty margin, he appears to be one of the more dangerous players on tour from that position.

Forehands and backhands

Let’s wrap up by looking at something a bit more specific. For backhands and forehands (without separating by direction), which players are most effective after hitting that shot from the backhand corner? We’re continuing to define effectiveness as winning as many points as possible after hitting the shot. I’ll also show how often each of the players opts for their effective shot, giving us a glimpse at tactical decisions, not just tactical success.

Here are the best backhands from the backhand corner. It was supposed to be a top ten list, but I think you’ll understand why I struggled to cut it off before listing the top 16 players, roughly one-fifth of the 75 players with at least 20 charted matches:

Player                 Post-shot  BH Freq  
Diego Schwartzman          52.8%    74.0%  
Rafael Nadal               52.7%    64.7%  
Novak Djokovic             52.7%    76.1%  
Kei Nishikori              51.7%    74.0%  
Gilles Simon               51.4%    88.0%  
Andrey Rublev              51.1%    67.1%  
Pablo Carreno Busta        51.1%    75.3%  
Nikoloz Basilashvili       51.0%    75.0%  
Alexander Zverev           50.8%    75.1%  
Alex de Minaur             50.6%    74.8%  
Daniil Medvedev            50.6%    87.2%  
Juan Martin del Potro      50.3%    49.1%  
Pablo Cuevas               50.2%    60.6%  
Andy Murray                50.1%    65.0%  
Richard Gasquet            49.9%    75.8%  
Stan Wawrinka              49.8%    63.4%

The “BH Freq” column–for backhand frequency–really demonstrates the range of tactics used by different players. Gilles Simon and Daniil Medvedev opt for the topspin backhand almost every time, rarely slicing or running around the shot. At the opposite extreme, Juan Martin del Potro hits a topspin backhand less the half the time from that position. Perhaps because of his selectiveness–dealing with awkward positions by slicing–he is effective when he makes that choice.

Now the best forehands from the backhand corner:

Player                 Post-shot  FH Freq  
Gilles Simon               63.1%     6.7%  
Rafael Nadal               61.9%    16.6%  
Benoit Paire               61.9%     1.5%  
Kei Nishikori              61.2%    10.4%  
Andrey Rublev              61.0%    20.1%  
Casper Ruud                60.8%    27.1%  
Marton Fucsovics           60.5%    16.3%  
Novak Djokovic             60.0%     9.7%  
Daniil Medvedev            59.8%     3.3%  
Pablo Cuevas               58.9%    20.9%  
Sam Querrey                58.2%    15.6%  
Felix Auger Aliassime      57.7%    16.0%

This list is more of a mixed bag, in part because there are so many fewer forehands from the backhand corner. Benoit Paire’s numbers are based on a mere 21 shots. I wouldn’t take his effectiveness seriously at all, but it’s always entertaining to see evidence of his uniqueness. At the opposite end of the spectrum is Casper Ruud, who runs around his backhand more than anyone else in the charting dataset except for Jack Sock and Joao Sousa. (Neither one of which is particularly effective, though presumably they do better by avoiding their backhands than they would by hitting it.)

One name you might have expected to see on the last list is Roger Federer. He’s around the 80th percentile in the forehand category, winning 56.9% of points when hitting a forehand from the backhand corner. He’s good, but not off the charts in this category. Like Nadal and Djokovic, he might look better if these numbers were adjusted for opponent, because so many of his charted matches are against fellow elites.

Next

There’s clearly a lot more to do here, including looking at probabilities for direction-specific shots, isolating the effect of certain opponents, and trying to control for more of the factors that aren’t explicitly present in the data. Not to mention extending the same framework to other shots from other positions on court. Stay tuned.

Tramlines and Wide Groundstrokes

The NextGen Finals are played on an unusual court, in that the surface is marked only for singles matches, leaving out the “tramlines” that define the doubles alleys. Virtually all tennis events includes doubles, as well, so this is rarely an option. The ATP has skipped tramlines at season-ending events before, but at the end of the 2010s, the singles-only court is exclusive to the NextGen Finals.

One might reasonably wonder whether the unique paint job has any effect on play:

I’d be fascinated to know if less balls go ‘wide’ on a court without trams. Does it focus the mind more?
— Lisa (@furryyelloballs) November 6, 2019

I discussed this on a recent podcast with Erik Jonsson, and we tentatively concluded that tennis pros (even young ones) with thousands of hours of playing experience shouldn’t be affected by a tweak to the appearance of the court. But why speculate when we can look at some data?

The Match Charting Project, my volunteer-driven effort to log shot-by-shot records of professional tennis matches, notes various details about errors–forced or unforced, and “type”–net, deep, wide, or wide-and-deep. MCP contributors didn’t immediately take to the NextGen Finals–before this week, the 2018 final was the only charted match out of the 6,600 matches in the dataset–but 2019 was different. We now have shot-by-shot stats for 8 of the 15 matches played in Milan last week. (Big thanks to Carrie, who took charge of Alex de Minaur’s entire run to the final.)

Quantifying wide errors

We’re interested in the frequency of wide errors, which isn’t quite as simple as it sounds. I chose to focus only groundstrokes, and I also excluded forced errors–shots on which the player might not have much control of the direction of the ball.

Here are three metrics we could use for the frequency of wide errors:

Wide errors per point
Wide errors per unforced error
Wide errors per “makeable” groundstroke–that is, groundstrokes that were either unforced errors or put in play

Wide errors per point is probably too crude, but it does have the advantage of simplicity. Wide errors per unforced error might have some value, telling us in what direction a player was most aggressive. The last, wide errors per makeable groundstroke, is probably the best representation of what we’re looking for, as it tells us how frequently a player tried to hit a shot and it went wide.

Here are de Minaur’s numbers for his five 2019 NextGen matches, along with his hard-court aggregates from 28 other charted matches in the last two years:

          Wide / Pt  Wide / UFE  Wide / GS  
NextGen        2.7%        1.5%      21.7%  
ATP Hard       3.0%        1.4%      21.4%

At least for Alex, the tramlines don’t seem to make much of a difference.

Let’s look at the slightly larger group of players. We have eight matches, which means 16 records of one match for a single player, including at least one for each of the eight guys who qualified for Milan. Here are the three wide-error rates for the NextGen Finals matches, along with the same players’ wide-error rates for other charted hard court matches in the last two years:

          Wide / Pt  Wide / UFE  Wide / GS  
NextGen        3.2%        1.8%      19.5%  
ATP Hard       3.2%        1.8%      23.1%

For our first two metrics, there is absolutely no effect. Tramlines or no tramlines, wide errors mark the end of 3.2% of points, and 1.8% of total unforced errors. (The 3.2% figure is per player, meaning that 6.4% of points were ended with a wide error.)

The third metric, though, is more interesting. On tour, these players make a wide error on 23.1% of their “makeable” groundstrokes. That number dropped by more than one-seventh, to 19.5%, on the tramline-free court in Milan. At the same time, the overall rate of unforced errors (not just wide errors) increased compared to the same players’ efforts on hard courts at other events.

Deep mind

I see two possible explanations for such a substantial drop. First, we don’t have much data, and maybe it’s just a fluke of a small sample. Some of the difference can be traced to Ugo Humbert, who didn’t make a single wide error in his one charted NextGen Finals match. (Humbert’s usual wide-error rates are close to average.) Without a lot more matches played on tramline-free surfaces–not to mention charts of those matches–we won’t be able to draw a firm conclusion.

Second, it could be a real effect stemming from some aspect of the conditions in Milan. The lack of tramlines really might, as Lisa puts it, “focus the mind.”

Compared to other innovations trialed at the NextGen Finals, the singles-only court gets very little press. But unlike, say, the towel rack or the shot clock, it might just have a small effect on play.

Will a Back-To-Normal Federer Backhand Be Good Enough?

Italian translation at settesei.it

After Roger Federer’s 2017 triumph over Rafael Nadal at the Australian Open, I credited his narrow victory to his backhand. He came back from the injury that sidelined him for the second half of 2016 having strengthened that wing, ready with the tactics necessary to use it against his long-time rival. Since that time, he has beaten Nadal in five out of six meetings, suggesting that the new-and-improved weapon has remained a part of his game.

The Swiss is riding high after defeating Rafa once again in the Wimbledon semi-finals on Friday. But unlike in Melbourne two-and-a-half years ago, the backhand wasn’t responsible for the victory. In the Australian Open final, Federer’s stylish one-hander earned him 11 more points than in a typical contest, enough to flip the result in his favor. On Friday, Nadal had little reason to fear a Federer backhand that was only a single point better than average. The Swiss owes his semi-final result to some stellar play, but not from his backhand.

BHP redux

I’m deriving these numbers from a stat called Backhand Potency (BHP), which uses Match Charting Project shot-by-shot data to isolate the effect of each one of a player’s shots. The formula is straightforward:

[A]dd one point for a winner or an opponent’s forced error, subtract one for an unforced error, add a half-point for a backhand that set up a winner or opponent’s error on the following shot, and subtract a half-point for a backhand that set up a winning shot from the opponent. Divide by the total number of backhands, multiply by 100, and the result is net effect of each player’s backhand.

The average player hits about 100 backhands per match, so the final step of multiplying by 100 gives us an approximate per-match figure. BHP hands out up to 1.5 “points” per tennis point, since credit is given for both a winning shot and the shot that set it up. Thus, to translate BHP (or any other potency metric, like Forehand Potency, FHP) to points, multiply by two-thirds. In the 2017 Australian Open final, Federer’s backhand was worth +17 BHP, equal to about 11 points.

On Friday, Roger’s backhand was worth only +1 BHP. The best thing we can say about that is that it didn’t hold him back–the sort of comment we might have made as he racked up wins for the first 15 years of his career.

The semi-final performance wasn’t an outlier. In a year-to-year comparison based on the available (admittedly incomplete) MCP data, the 2019 backhand looks an awful lot like the pre-injury backhand:

Year(s)     BHP  
1998-2011  +0.1  
2012       +0.4  
2013       -1.8  
2014       -1.1  
2015       +1.3  
2016       -0.3  
2017       +3.5  
2018       +1.3  
2019       +0.8

There are still good days, like Fed’s whopping +16 BHP against Kei Nishikori in this week’s quarter-finals. But when we tally up all the noise of good and bad days, effective and ineffective opponents, and fast and slow conditions, the net result is that the backhand just doesn’t rack up points the way it did two years ago.

The backhand versus Novak

Federer’s opponent in today’s final, Novak Djokovic, is known for his own rock-solid groundstrokes. Like Nadal did for many years, Djokovic is able to expose the weaker side of Federer’s baseline game. The Serbian has won the last five head-to-head meetings, and nine of the last eleven. In most of those, he reduces Roger’s backhand to a net negative:

Year  Tournament        Result  BHP/100  
2018  Paris             L         -11.0  
2018  Cincinnati        L         -11.0  
2016  Australian Open   L         -12.6  
2015  Tour Finals (F)   L          -4.8  
2015  Tour Finals (RR)  W          +0.7  
2015  US Open           L          +0.8  
2015  Cincinnati        W          -2.2  
2015  Wimbledon         L         -13.4  
2015  Rome              L         -12.2  
2015  Indian Wells      L          -5.0  
2015  Dubai             W          -5.9  
…                                        
2014  Wimbledon         L          -3.1  
2012  Wimbledon         W          +9.6

Out of 438 charted matches, Federer’s BHP was below -10 only 27 times. On nine of those occasions–and two of the five since Fed’s 2017 comeback–the opponent was Djokovic. Incidentally, Novak would do well to study how Borna Coric dismantles the Federer backhand, as Fed suffered his two worst post-injury performances (-20 at 2018 Shanghai, and -19 at 2019 Rome) against the young Croatian.

It is probably too much to ask for Federer to figure out how to beat Djokovic at his own game. The best he can do is minimize the damages by serving big and executing on the forehand. The Swiss has a career average +9 Forehand Potency (FHP), but falls to only +4 FHP against Novak. In last year’s Cincinnati final, Djokovic reduced his opponent to an embarrassing -13 FHP, the worst of his career. It wasn’t a fluke: four of Fed’s five worst single-match FHP numbers have come against the Serb.

If Federer is to win a ninth Wimbledon title, he’ll need to rack up points on at least one wing–either his typical forehand, or the backhand in the way he did against Djokovic in the 2012 semi-final. Whichever one does the damage, he’ll also need the other one to remain steady. His forehand was plenty effective in the semi-final against Nadal, worth +12 FHP in that match. Against a player like Novak who defends even better on a fast surface, Federer will need to somehow tally similar results. It’s a lot to ask, and one thing is certain: No one would be able to complain that his 21st major title came cheaply.

Dayana Yastremska Hits Harder Than You

Italian translation at settesei.it

At the 2019 Australian Open, tennis balls have more to fear than ever before. Serena Williams is back and appears to be in top form, Maria Sharapova is playing well enough to oust defending champion Caroline Wozniacki, and Petra Kvitova has followed up her Sydney title with a stress-free jaunt through the first three rounds.

And then there are the youngsters. Hyper-aggressive 20-year-old Aryna Sabalenka crashed out in the third round against an even younger threat, Amanda Anisimova. But still in the draw, facing Serena on Saturday, is the hardest hitter of all, 18-year-old Ukrainian Dayana Yastremska. Watch a couple of Sabalenka matches, and you might wonder if we’ve reached the apex of aggression on the tennis court. Nope: Yastremska turns it up to 11.

When Lowell first introduced his aggression score metric a few years ago, Kvitova was the clear leader of the pack, the player who ended points–for good or ill–most frequently with the ball on her racket. Madison Keys wasn’t far behind, with Serena coming in third among the small group of players for which we had sufficient data. Since then, two things have changed: The Match Charting Project now has a lot more data on many more players, and a new generation of ball-bashers has threatened to make the rest of the tour look like weaklings in comparison.

The aggression score metric packs a lot of explanatory power in a simple calculation: It’s the number of point-ending shots (winners, unforced errors, or shots that induce a forced error from the opponent) divided by the number of shot opportunities. The resulting statistic ranges from about 10% at the lower extreme–Sara Errani’s career average is 11.6%–to 30%* at the top end. Individual matches can be even higher or lower, but no player with at least five charted matches sits outside of that range.

* Readers with a keen memory or a penchant for following links will notice that in Lowell’s orignial post, Kvitova’s aggregate score was 33% and Keys was also a tick above 30%. I’m not sure whether those were flukes that have since come back down with larger samples, or whether I’m using a slightly different formula. Either way, the ordering of players has remained consistent, and that’s the important thing.

Here are the top ten most aggressive WTA tour regulars of the 2010s before Sabalenka and Yastremska came along:

Rank  Player                      Agg 
1     Petra Kvitova             27.1%  
2     Julia Goerges             26.8%  
3     Serena Williams           26.8%  
4     Jelena Ostapenko          26.5%  
5     Camila Giorgi             26.0%  
6     Madison Keys              25.9%  
7     Coco Vandeweghe           25.9%  
8     Sabine Lisicki            25.6%  
9     Anastasia Pavlyuchenkova  24.0%  
10    Maria Sharapova           23.2%

All of these women rank among the top 15% of most aggressive players. They end points more frequently on their own racket than plenty of competitors we also consider aggressive, like Venus Williams (21.9%), Karolina Pliskova (21.6%), and Johanna Konta (22.3%). Ostapenko bridges the gap between the two generations; she wasn’t part of the discussion when aggression score was first introduced, though once she started winning matches, it was immediately clear that she’d challenge Kvitova at the top of this list.

Here’s the current top ten:

Rank  Player               Agg  
1     Dayana Yastremska  28.6%  
2     Aryna Sabalenka    27.6%  
3     Petra Kvitova      27.1%  
4     Julia Goerges      26.8%  
5     Serena Williams    26.8%  
6     Jelena Ostapenko   26.5%  
7     Viktoria Kuzmova   26.0%  
8     Camila Giorgi      26.0%  
9     Madison Keys       25.9%  
10    Coco Vandeweghe    25.9%

Yastremska, Sabalenka, and even Viktoria Kuzmova have elbowed their way into the top ten. Yastremska’s and Kuzmova’s places on this list might be a little premature, since their scores are based on only seven and nine matches, respectively. But Sabalenka’s pugnaciousness is well-documented: her Petra-topping score of 27.6% is an average across almost 30 matches.

Tennis tends to swing between extremes, with one generation developing skills to counteract the abilities of the previous one. It’s not yet clear whether the aggression of these young women will catapult them to the top–after all, Sabalenka won only five games today against Anisimova, whose aggression score is a more modestly high 23.0%. Perhaps as they gain experience, they’ll develop more well-rounded games and return Kvitova to her place at the top.

In the meantime, we have the privilege of watching some of the hardest hitters in WTA history battle it out. Tomorrow, Yastremska will contest her first third round at a major in a must-watch match against Serena. There will be fireworks, but it’s safe to say there won’t be much in the way of rallies.

Measuring the Best Smashes in Tennis

Italian translation at settesei.it: part 1, part 2

How can we identify the best shots in tennis? At first glance, it seems like a simple problem. Thanks to the shot-by-shot data collected for over 3,500 matches by the Match Charting Project, we can look at every instance of the shot in question and see what happened. If a player hits a lot of winners, or wins most of the ensuing points, he or she is probably pretty good at that shot. Lots of unforced errors would lead us to conclude the opposite.

A friend recently posed a more specific question: Who has the best smash in the men’s game? Compared to other shots such as, say, slice backhands, smashes should be pretty easy to evaluate. A large percentage of them end the point–in the contemporary men’s game (I discuss the women’s game later on), 69% are winners or induce forced errors–which reduces the problem to a straightforward one.

The simplest algorithm to answer my friend’s question is to determine how often each player ends the point in his favor when hitting a smash–that is, with a winner or by inducing a forced error. Call the resulting ratio “W/SM.” The Match Charting Project (MCP) dataset has at least 10 tour-level matches for 80 different men, and the W/SM ratio for those players ranges from 84% (Jeremy Chardy) all the way down to 30% (Paolo Lorenzi). Both of those extremes are represented by players with relatively small samples; if we limit our scope to men with at least 90 recorded smashes, the range isn’t quite as wide. The best of the bunch is Jo-Wilfried Tsonga, at 79%, and the “worst” is Ivan Lendl, at 57%. That isn’t quite fair to Lendl, since smash success rates have improved quite a bit over the years, and Lendl’s rate is only a couple percentage points below the average for the 1980s. Among active players with at least 90 smashes in the books, Stan Wawrinka brings up the rear, with a W/SM of 65%.

We can look at the longer-term effects of a player’s smashes without adding much complexity. It’s ideal to end the point with a smash, but most players would settle for winning the point. When hitting a smash, ATPers these days end up winning the point 81% of the time, ranging from 97% (Chardy again) down to 45% (Lorenzi again). Once again, Tsonga leads the pack of the bigger-sample-size players, winning the point 90% of the time after hitting a smash, and among active players, Wawrinka is still at the bottom of that subset, at 77%.

Here is a list of all players with at least 90 smashes in the MCP dataset, with their winners (and induced forced errors) per smash (W/SM), errors per smash (E/SM), and points won per smash (PTS/SM):

PLAYER              W/SM  E/SM  PTS/SM  
Jo-Wilfried Tsonga   78%    6%     90%  
Tomas Berdych        76%    6%     88%  
Pete Sampras         75%    7%     86%  
Roger Federer        73%    7%     86%  
Rafael Nadal         69%    7%     84%  
Milos Raonic         73%    9%     82%  
Andy Murray          67%    6%     82%  
Kei Nishikori        68%   11%     81%  
David Ferrer         71%    9%     81%  
Andre Agassi         67%    8%     80%  
Novak Djokovic       66%    9%     80%  
Stefan Edberg        62%   12%     78%  
Stan Wawrinka        65%   10%     77%  
Ivan Lendl           57%   13%     71%

These numbers give us a pretty good idea of who you should back if the ATP ever hosts the smash-hitting equivalent of baseball’s Home Run Derby. Best of all, it doesn’t commit any egregious offenses against common sense: We’d expect to see Tsonga and Roger Federer near the top, and we’d know something was wrong if Novak Djokovic were too far from the bottom.

Smash opportunities

Still, we need to do better. Almost every shot made in a tennis match represents a decision made by the player hitting it: topspin or slice? backhand or run-around forehand? approach or stay back? Many smashes are obvious choices, but a large number are not. Different players make different choices, and to evaluate any particular shot, we need to subtly reframe the question. Instead of vaguely asking for “the best,” we’d be better served looking for the player who gets the most value out of his smash. While the two questions are similar, they are not the same.

Let’s expand our view to what we might call “smash opportunities.” Once again, smashes make our task relatively straightforward: We can define a smash opportunity simply as a lob hit by the opponent.* In the contemporary ATP, roughly 72% of lobs result in smashes–the rest either go for winners or are handled with a different shot. Different players have very different strategies: Federer, Pete Sampras, and Milos Raonic all hit smashes in more than 84% of opportunities, while a few other men come in under 50%. Nick Kyrgios, for instance, tried a smash in only 20 of 49 (41%) of recorded opportunities. Of those players with more available data, Juan Martin Del Potro elected to go for the overhead in 61 of 114 (54%) of chances, and Andy Murray in 271 of 433 (62.6%).

* Using an imperfect dataset, it’s a bit more complicated; sometimes the shots that precede smashes are coded as topspin or slice groundstrokes. I’ve counted those as smash opportunities as well.

Not all lobs are created equal, of course. With a large number of points, we would expect them to even out, but even then, a player’s overall style may effect the smash opportunities he sees. That’s a more difficult issue for another day; for now, it’s easiest to assume that each player’s mix of smash opportunities are roughly equal, though we’ll keep in mind the likelihood that we’ve swept some complexity under the rug.

With such a wide range of smashes per smash opportunities (SM/SMO), it’s clear that some players’ average smashes are more difficult than others. Federer hits about half again as many smashes per opportunity as del Potro does, suggesting that Fed’s attempts are more difficult than Delpo’s; on those more difficult attempts, Delpo is choosing a different shot. The Argentine is very effective when he opts for the smash, winning 84% of those points, but it seems likely that his rate would not be so high if he hit smashes as frequently as Federer does.

This leads us to a slightly different question: Which players are most effective when dealing with smash opportunities? The smash itself doesn’t necessarily matter–if a player is equally effective with, say, swinging volleys, the lack of a smash would be irrelevant. The smash is simply an effective tool that most players employ to deal with these situations.

Smash opportunities don’t offer the same level of guarantee that smashes themselves do: In the ATP these days, players win 72% of points after being handed a smash opportunity, and 56% of the shots they hit result in winners or induced forced errors. Looking at these situations takes us a bit off-track, but it also allows us to study a broader question with more impact on the game as a whole, because smash opportunities represent a larger number of shots than smashes themselves do.

Here is a list of all the players with at least 99 smash opportunities in the MCP dataset, along with the rate at which they hit smashes (SM/SMO), the rate at which they hit winners or induced forced errors in response to smash opportunites (W/SMO), hit errors in those situations (E/SMO), and won the points when given lobs (PTW/SMO). Like the list above, players are ranked by the rightmost column, points won.

PLAYER              SM/SMO  W/SMO  E/SMO  PTW/SMO  
Jo-Wilfried Tsonga     80%    68%    13%      80%  
Roger Federer          84%    66%    13%      78%  
Pete Sampras           86%    68%    15%      78%  
Tomas Berdych          75%    66%    16%      76%  
Milos Raonic           85%    67%    14%      76%  
Novak Djokovic         81%    60%    13%      75%  
Kevin Anderson         66%    57%    12%      74%  
Rafael Nadal           74%    57%    16%      73%  
Andre Agassi           77%    62%    17%      73%  
Boris Becker           85%    59%    18%      72%  
Stan Wawrinka          79%    58%    15%      72%  
Kei Nishikori          72%    57%    17%      70%  
Andy Murray            63%    52%    15%      70%  
Dominic Thiem          66%    52%    11%      70%  
David Ferrer           71%    57%    17%      69%  
Pablo Cuevas           73%    54%    14%      67%  
Stefan Edberg          81%    52%    23%      65%  
Bjorn Borg             81%    41%    20%      63%  
JM del Potro           54%    48%    19%      60%  
Ivan Lendl             74%    45%    28%      59%  
John McEnroe           74%    43%    24%      56%

The order of this list has much in common with the previous one, with names like Federer, Sampras, and Tsonga at the top. Yet there are key differences: Djokovic and Wawrinka are particularly effective when they respond to a lob with something other than an overhead, while del Potro is the opposite, landing near the bottom of this ranking despite being quite effective with the smash itself.

The rate at which a player converts opportunities to smashes has some impact on his overall success rate on smash opportunities, but the relationship isn’t that strong (r^2 = 0.18). Other options, such as swinging volleys or mid-court forehands, also give players a good chance of winning the point.

Smash value

Let’s get back to my revised question: Who gets the most value out of his smash? A good answer needs to combine how well he hits it with how often he hits it. Once we can quantify that, we’ll be able to see just how much a good or bad smash can impact a player’s bottom line, measured in overall points won, and how much a great smash differs from an abysmal one.

As noted above, the average current-day ATPer wins the point 81% of the time that he hits a smash. Let’s reframe that in terms of the probability of winning a point: When a lob is flying through the air and a player readies his racket to hit an overhead, his chance of winning the point is 81%–most of the hard work is already done, having generated such a favorable situation. If our player ends up winning the point, the smash improved his odds by 0.19 points (from 0.81 to 1.0), and if he ends up losing the point, the smash hurt his odds by 0.81 (from 0.81 to 0.0). A player who hits five successful smashes in a row has a smash worth about one total point: 5 multiplied by 0.19 equals 0.95.

We can use this simple formula to estimate how much each player’s smash is worth, denominated in points. We’ll call that Point Probability Added (PPA). Finally, we need to take into account how often the player hits his smash. To do so, we’ll simply divide PPA by total number of points played, then multiply by 100 to make the results more readable. The metric, then, is PPA per 100 points, reflecting the impact of the smash in a typical short match. Most players have similar numbers of smash opportunities, but as we’ve seen, some choose to hit far more overheads than others. When we divide by points, we give more credit to players who hit their smashes more often.

The overall impact of the smash turns out to be quite small. Here are the 1990s-and-later players with at least 99 smash opportunities in the dataset along with their smash PPA per 100 points:

PLAYER                 SM PPA/100  
Jo-Wilfried Tsonga           0.17  
Pete Sampras                 0.11  
Tomas Berdych                0.11  
Roger Federer                0.10  
Rafael Nadal                 0.05  
Milos Raonic                 0.04  
Juan Martin del Potro        0.02  
Andy Murray                  0.01  
Kevin Anderson               0.01  
Kei Nishikori                0.00  
David Ferrer                 0.00  
Andre Agassi                 0.00  
Novak Djokovic              -0.02  
Stan Wawrinka               -0.07  
Dominic Thiem               -0.07  
Pablo Cuevas                -0.10

Tsonga reigns supreme, from the most basic measurement to the most complex. His 0.17 smash PPA per 100 points means that the quality of his overhead earns him about one extra point (compared to an average ATPer) every 600 points. That doesn’t sound like much, and rightfully so: He hits fewer than one smash per 50 points, and as good as Tsonga is, the average player has a very serviceable smash as well.

The list gives us an idea of the overall range of smash-hitting ability, as well. Among active players, the laggard in this group is Pablo Cuevas, at -0.1 points per 100, meaning that his subpar smash costs him one point out of every thousand he plays. It’s possible to be worse–in Lorenzi’s small sample, his rate is -0.65–but if we limit our scope to these well-studied players, the difference between the high and low extremes is barely 0.25 points per 100, or one point out of every 400.

I’ve excluded several players from earlier generations from this list; as mentioned earlier, the average smash success rate in those days was lower, so measuring legends like McEnroe and Borg using a 2010s-based point probability formula is flat-out wrong. That said, we’re on safe ground with Sampras and Agassi; the rate at which players convert smashes into points won has remained fairly steady since the early 1990s.

Lob-responding value

We’ve seen the potential impact of smash skill; let’s widen our scope again and look at the potential impact of smash opportunity skill. When a player is faced with a lob, but before he decides what shot to hit, his chance of winning the point is about 72%. Thus, hitting a shot that results in winning the point is worth 0.28 points of point probability added, while a choice that ends up losing the point translates to -0.72.

There are more smash opportunities than smashes, and more room to improve on the average (72% instead of 81%), so we would expect to see a bigger range of PPA per 100 points. Put another way, we would expect that lob-responding skill, which includes smashes, is more important than smash-specific skill.

It’s a modest difference, but it does look like lob-responding skill has a bigger range than smash skill. Here is the same group of players, still showing their PPA/100 for smashes (SM PPA/100), now also including their PPA/100 for smash opportunities (SMO PPA/100):

PLAYER                 SM PPA/100  SMO PPA/100  
Jo-Wilfried Tsonga           0.17         0.18  
Roger Federer                0.10         0.16  
Pete Sampras                 0.11         0.16  
Milos Raonic                 0.04         0.12  
Tomas Berdych                0.11         0.09  
Kevin Anderson               0.01         0.08  
Novak Djokovic              -0.02         0.07  
Rafael Nadal                 0.05         0.03  
Andre Agassi                 0.00         0.01  
Stan Wawrinka               -0.07         0.00  
Kei Nishikori                0.00        -0.03  
Andy Murray                  0.01        -0.03  
Dominic Thiem               -0.07        -0.05  
David Ferrer                 0.00        -0.06  
Pablo Cuevas                -0.10        -0.12  
Juan Martin del Potro        0.02        -0.19

Djokovic and Delpo draw our attention again as the players whose smash skills do not accurately represent their smash opportunity skills. Djokovic is slightly below average with smashes, but a few notches above the norm on opportunities; Delpo is a tick above average when he hits smashes, but dreadful when dealing with lobs in general.

As it turns out, we can measure the best smashes in tennis, both to compare players and to get a general sense of the shot’s importance. What we’ve also seen is that smashes don’t tell the entire story–we learn more about a player’s overall ability when we widen our view to smash opportunities.

Smashes in the women’s game

Contemporary women hit far fewer smashes than men do, and they win points less often when they hit them. Despite the differences, the reasoning outlined above applies just as well to the WTA. Let’s take a look.

In the WTA of this decade, smashes result in winners (or induced forced errors) 63% of the time, and smashes result in points won about 75% of the time. Both numbers are lower than the equivalent ATP figures (69% and 81%, respectively), but not dramatically so. Here are the rates of winners, errors, and points won per smash for the 14 women with at least 80 smashes in the MCP dataset:

PLAYER               W/SM  E/SM  PTS/SM  
Jelena Jankovic       73%    9%     83%  
Serena Williams       72%   13%     81%  
Steffi Graf           61%    9%     81%  
Svetlana Kuznetsova   70%   10%     79%  
Simona Halep          66%   11%     76%  
Caroline Wozniacki    61%   16%     74%  
Karolina Pliskova     62%   18%     74%  
Agnieszka Radwanska   54%   13%     74%  
Angelique Kerber      57%   15%     72%  
Martina Navratilova   54%   13%     71%  
Monica Niculescu      50%   15%     70%  
Garbine Muguruza      63%   19%     70%  
Petra Kvitova         59%   22%     68%  
Roberta Vinci         58%   14%     68%

Historical shot-by-shot data is less representative for women than for men, so it’s probably safest to assume that trends in smash success rates are similar for men than for women. If that’s true, Steffi Graf’s era is similar to the present, while Martina Navratilova’s prime saw far fewer smashes going for winners or points won.

Where the women’s game really differs from the men’s is the difference between smash opportunities (lobs) and smashes. As we saw above, 72% of ATP smash opportunities result in smashes. In the current WTA, the corresponding figure is less than half that: 35%. Some of the single-player numbers are almost too extreme to be believed: In 12 matches, Catherine Bellis faced 41 lobs and hit 3 smashes; in 29 charted matches, Jelena Ostapenko saw 103 smash opportunities and tried only 10 smashes. A generation ago, the gender difference was tiny: Graf, Martina Hingis, Arantxa Sanchez Vicario, and Monica Seles all hit smashes in at least three-quarters of their opportunities. But among active players, only Barbora Strycova comes in above 70%.

Here are the smash opportunity numbers for the 17 women with at least 150 smash opportunities in the MCP dataset. SM/SMO is smashes per chance, W/SMO is winners (and induced forced errors) per smash opportunity, E/SMO is errors per opportunity, and PTS/SMO is points won per smash opportunity:

PLAYER                SM/SMO  W/SMO  E/SMO  PTW/SMO  
Maria Sharapova          12%    57%    11%      76%  
Serena Williams          55%    58%    18%      72%  
Steffi Graf              82%    52%    17%      71%  
Karolina Pliskova        47%    52%    16%      70%  
Simona Halep             14%    41%    11%      69%  
Carla Suarez Navarro     25%    33%     9%      69%  
Eugenie Bouchard         29%    50%    18%      68%  
Victoria Azarenka        35%    52%    17%      67%  
Angelique Kerber         39%    42%    14%      66%  
Garbine Muguruza         43%    51%    18%      66%  
Monica Niculescu         57%    41%    19%      65%  
Petra Kvitova            48%    50%    19%      65%  
Agnieszka Radwanska      44%    42%    18%      65%  
Johanna Konta            30%    47%    21%      64%  
Caroline Wozniacki       36%    44%    18%      64%  
Elina Svitolina          14%    38%    14%      63%  
Martina Navratilova      67%    42%    26%      58%

It’s clear from the top of this list that women’s tennis is a different ballgame. Maria Sharapova almost never opts for an overhead, but when faced with a lob, she is the best of them all. Next up is Serena Williams, who hits almost as many smashes as any active player on this list, and is nearly as successful. Recall that in the men’s game, there is a modest positive correlation between smashes per opportunity and points won per smash opportunity; here, the relationship is weaker, and slightly negative.

Because most women hit so few smashes, there isn’t quite as much to be gained by using point probability added (PPA) to measure WTA smash skill. Graf was exceptionally good, comparable to Tsonga in the value she extracted from her smash, but among active players, only Serena and Victoria Azarenka can claim a smash that is worth close to one point per thousand. At the other extreme, Monica Niculescu is nearly as bad as Graf was good, suggesting she ought to figure out a way to respond to more smash opportunities with her signature forehand slice.

Here is the same group of women (minus Navratilova, whose era makes PPA comparisons misleading), with their PPA per 100 points for smashes (SM PPA/100) and smash opportunities (SMO PPA/100):

PLAYER                SM PPA/100  SMO PPA/100  
Maria Sharapova             0.03         0.21  
Serena Williams             0.09         0.15  
Steffi Graf                 0.15         0.14  
Karolina Pliskova          -0.01         0.09  
Carla Suarez Navarro        0.04         0.08  
Simona Halep                0.00         0.07  
Eugenie Bouchard           -0.02         0.03  
Victoria Azarenka           0.08         0.00  
Angelique Kerber           -0.03        -0.02  
Garbine Muguruza           -0.07        -0.03  
Petra Kvitova              -0.07        -0.04  
Monica Niculescu           -0.13        -0.06  
Caroline Wozniacki         -0.01        -0.07  
Agnieszka Radwanska        -0.02        -0.07  
Johanna Konta              -0.12        -0.08  
Elina Svitolina             0.01        -0.09

The table is sorted by smash opportunity PPA, which tells us about a much more relevant skill in the women’s game. Sharapova’s lob-responding ability is well ahead of the pack, worth better than one point above average per 500, with Serena and Graf not far behind. The overall range among these well-studied players, from Sharapova’s 0.21 to Elina Svitolina’s -0.09, is somewhat smaller than the equivalent range in the ATP, but with such outliers as Sharapova here and Delpo on the men’s side, it’s tough to draw firm conclusions from small subsets of players, however elite they are.

Final thought

The approach I’ve outlined here to measure the impact of smash and smash-opportunity skills is one that could be applied to other shots. Smashes are a good place to start because they are so simple: Many of them end points, and even when they don’t, they often virtually guarantee that one player will win the point. While smashes are a bit more complex than they first appear, the complications involved in applying a similar algorithm to, say, backhands and backhand opportunities, are considerably greater. That said, I believe this algorithm represents a promising entry point to these more daunting problems.

Measuring the Impact of the Serve in Men’s Tennis

By just about any measure, the serve is the most important shot in tennis. In men’s professional tennis, with its powerful deliveries and short points, the serve is all the more crucial. It is the one shot guaranteed to occur in every rally, and in many points, it is the only shot.

Yet we don’t have a good way of measuring exactly how important it is. It’s easy to determine which players have the best serves–they tend to show up at the top of the leaderboards for aces and service points won–but the available statistics are very limited if we want a more precise picture. The ace stat counts only a subset of those points decided by the serve, and the tally of service points won (or 1st serve points won, or 2nd serve points won) combines the effect of the serve with all of the other shots in a player’s arsenal.

Aces are not the only points in which the serve is decisive, and some service points won are decided long after the serve ceases to have any relevance to the point. What we need is a method to estimate how much impact the serve has on points of various lengths.

It seems like a fair assumption that if a server hits a winner on his second shot, the serve itself deserves some of the credit, even if the returner got it back in play. In any particular instance, the serve might be really important–imagine Roger Federer swatting away a weak return from the service line–or downright counterproductive–think of Rafael Nadal lunging to defend against a good return and hitting a miraculous down-the-line winner. With the wide variety of paths a tennis point can follow, though, all we can do is generalize. And in the aggregate, the serve probably has a lot to do with a 3-shot rally. At the other extreme, a 25-shot rally may start with a great serve or a mediocre one, but by the time by the point is decided, the effect of the serve has been canceled out.

With data from the Match Charting Project, we can quantify the effect. Using about 1,200 tour-level men’s matches from 2000 to the present, I looked at each of the server’s shots grouped by the stage of the rally–that is, his second shot, his third shot, and so on–and calculated how frequently it ended the point. A player’s underlying skills shouldn’t change during a point–his forehand is as good at the end as it is at the beginning, unless fatigue strikes–so if the serve had no effect on the success of subsequent shots, players would end the point equally often with every shot.

Of course, the serve does have an effect, so points won by the server end much more frequently on the few shots just after the serve than they do later on. This graph illustrates how the “point ending rate” changes:

On first serve points (the blue line), if the server has a “makeable” second shot (the third shot of the rally, “3” on the horizontal axis, where “makeable” is defined as a shot that results in an unforced error or is put back in play), there is a 28.1% chance it ends the point in the server’s favor, either with a winner or by inducing an error on the next shot. On the following shot, the rate falls to 25.6%, then 21.8%, and then down into what we’ll call the “base rate” range between 18% and 20%.

The base rate tells us how often players are able to end points in their favor after the serve ceases to provide an advantage. Since the point ending rate stabilizes beginning with the fifth shot (after first serves), we can pinpoint that stage of the rally as the moment–for the average player, anyway–when the serve is no longer an advantage.

As the graph shows, second serve points (shown with a red line) are a very different story. It appears that the serve has no impact once the returner gets the ball back in play. Even that slight blip with the server’s third shot (“5” on the horizontal axis, for the rally’s fifth shot) is no higher than the point ending rate on the 15th shot of first-serve rallies. This tallies with the conclusions of some other research I did six years ago, and it has the added benefit of agreeing with common sense, since ATP servers win only about half of their second serve points.

Of course, some players get plenty of positive after-effects from their second serves: When John Isner hits a second shot on a second-serve point, he finishes the point in his favor 30% of the time, a number that falls to 22% by his fourth shot. His second serve has effects that mirror those of an average player’s first serve.

Removing unforced errors

I wanted to build this metric without resorting to the vagaries of differentiating forced and unforced errors, but it wasn’t to be. The “point-ending” rates shown above include points that ended when the server’s opponent made an unforced error. We can argue about whether, or how much, such errors should be credited to the server, but for our purposes today, the important thing is that unforced errors aren’t affected that much by the stage of the rally.

If we want to isolate the effect of the serve, then, we should remove unforced errors. When we do so, we discover an even sharper effect. The rate at which the server hits winners (or induces forced errors) depends heavily on the stage of the rally. Here’s the same graph as above, only with opponent unforced errors removed:

The two graphs look very similar. Again, the first serve loses its effect around the 9th shot in the rally, and the second serve confers no advantage on later shots in the point. The important difference to notice is the ratio between the peak winner rate and the base rate, which is now just above 10%. When we counted unforced errors, the ratio between peak and base rate was about 3:2. With unforced errors removed, the ratio is close to 2:1, suggesting that when the server hits a winner on his second shot, the serve and the winner contributed roughly equally to the outcome of the point. It seems more appropriate to skip opponent unforced errors when measuring the effect of the serve, and the resulting 2:1 ratio jibes better with my intuition.

Making a metric

Now for the fun part. To narrow our focus, let’s zero in on one particular question: What percentage of service points won can be attributed to the serve? To answer that question, I want to consider only the server’s own efforts. For unreturned serves and unforced errors, we might be tempted to give negative credit to the other player. But for today’s purposes, I want to divvy up the credit among the server’s assets–his serve and his other shots–like separating the contributions of a baseball team’s pitching from its defense.

For unreturned serves, that’s easy. 100% of the credit belongs to the serve.

For second serve points in which the return was put in play, 0% of the credit goes to the serve. As we’ve seen, for the average player, once the return comes back, the server no longer has an advantage.

For first-serve points in which the return was put in play and the server won by his fourth shot, the serve gets some credit, but not all, and the amount of credit depends on how quickly the point ended. The following table shows the exact rates at which players hit winners on each shot, in the “Winner %” column:

Server's…  Winner %  W%/Base  Shot credit  Serve credit  
2nd shot      21.2%     1.96        51.0%         49.0%  
3rd shot      18.1%     1.68        59.6%         40.4%  
4th shot      13.3%     1.23        81.0%         19.0%  
5th+          10.8%     1.00       100.0%          0.0%

Compared to a base rate of 10.8% winners per shot opportunity, we can calculate the approximate value of the serve in points that end on the server’s 2nd, 3rd, and 4th shots. The resulting numbers come out close to round figures, so because these are hardly laws of nature (and the sample of charted matches has its biases), we’ll go with round numbers. We’ll give the serve 50% of the credit when the server needed only two shots, 40% when he needed three shots, and 20% when he needed four shots. After that, the advantage conferred by the serve is usually canceled out, so in longer rallies, the serve gets 0% of the credit.

Tour averages

Finally, we can begin the answer the question, What percentage of service points won can be attributed to the serve? This, I believe, is a good proxy for the slipperier query I started with, How important is the serve?

To do that, we take the same subset of 1,200 or so charted matches, tally the number of unreturned serves and first-serve points that ended with various numbers of shots, and assign credit to the serve based on the multipliers above. Adding up all the credit due to the serve gives us a raw number of “points” that the player won thanks to his serve. When we divide that number by the actual number of service points won, we find out how much of his service success was due to the serve itself. Let’s call the resulting number Serve Impact, or SvI.

Here are the aggregates for the entire tour, as well as for each major surface:

         1st SvI  2nd SvI  Total SvI  
Overall    63.4%    31.0%      53.6%  
Hard       64.6%    31.5%      54.4%  
Clay       56.9%    27.0%      47.8%  
Grass      70.8%    37.3%      61.5%

Bottom line, it appears that just over half of service points won are attributable to the serve itself. As expected, that number is lower on clay and higher on grass.

Since about two-thirds of the points that men win come on their own serves, we can go even one step further: roughly one-third of the points won by a men’s tennis player are due to his serve.

Player by player

These are averages, and the most interesting players rarely hew to the mean. Using the 50/40/20 multipliers, Isner’s SvI is a whopping 70.8% and Diego Schwartzman‘s is a mere 37.7%. As far from the middle as those are, they understate the uniqueness of these players. I hinted above that the same multipliers are not appropriate for everyone; the average player reaps no positive after-effects of his second serve, but Isner certainly does. The standard formula we’ve used so far credits Isner with an outrageous SvI, even without giving him credit for the “second serve plus one” points he racks up.

In other words, to get player-specific results, we need player-specific multipliers. To do that, we start by finding a player-specific base rate, for which we’ll use the winner (and induced forced error) rate for all shots starting with the server’s fifth shot on first-serve points and shots starting with the server’s fourth on second-serve points. Then we check the winner rate on the server’s 2nd, 3rd, and 4th shots on first-serve points and his 2nd and 3rd shots on second-serve points, and if the rate is at least 20% higher than the base rate, we give the player’s serve the corresponding amount of credit.

Here are the resulting multipliers for a quartet of players you might find interesting, with plenty of surprises already:

                   1st serve              2nd serve       
                    2nd shot  3rd  4th     2nd shot  3rd  
Roger Federer            55%  50%  30%           0%   0%  
Rafael Nadal             31%   0%   0%           0%   0%  
John Isner               46%  41%   0%          34%   0%  
Diego Schwartzman        20%  35%   0%           0%  25%  
Average                  50%  30%  20%           0%   0%

Roger Federer gets more positive after-effects from his first serve than average, more even than Isner does. The big American is a tricky case, both because so few of his serves come back and because he is so aggressive at all times, meaning that his base winner rate is very high. At the other extreme, Schwartzman and Rafael Nadal get very little follow-on benefit from their serves. Schwartzman’s multipliers are particularly intriguing, since on both first and second serves, his winner rate on his third shot is higher than on his second shot. Serve plus two, anyone?

Using player-specific multipliers makes Isner’s and Schwartzman’s SvI numbers more extreme. Isner’s ticks up a bit to 72.4% (just behind Ivo Karlovic), while Schwartzman’s drops to 35.0%, the lowest of anyone I’ve looked at. I’ve calculated multipliers and SvI for all 33 players with at least 1,000 tour-level service points in the Match Charting Project database:

Player                 1st SvI  2nd SvI  Total SvI  
Ivo Karlovic             79.2%    56.1%      73.3%  
John Isner               78.3%    54.3%      72.4%  
Andy Roddick             77.8%    51.0%      71.1%  
Feliciano Lopez          83.3%    37.1%      68.9%  
Kevin Anderson           77.7%    42.5%      68.4%  
Milos Raonic             77.4%    36.0%      66.0%  
Marin Cilic              77.1%    34.1%      63.3%  
Nick Kyrgios             70.6%    41.0%      62.5%  
Alexandr Dolgopolov      74.0%    37.8%      61.3%  
Gael Monfils             69.8%    37.7%      60.8%  
Roger Federer            70.6%    32.0%      58.8%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Bernard Tomic            67.6%    28.7%      58.5%  
Tomas Berdych            71.6%    27.0%      57.2%  
Alexander Zverev         65.4%    30.2%      54.9%  
Fernando Verdasco        61.6%    32.9%      54.3%  
Stan Wawrinka            65.4%    33.7%      54.2%  
Lleyton Hewitt           66.7%    32.1%      53.4%  
Juan Martin Del Potro    63.1%    28.2%      53.4%  
Grigor Dimitrov          62.9%    28.6%      53.3%  
Jo Wilfried Tsonga       65.3%    25.9%      52.7%  
Marat Safin              68.4%    22.7%      52.3%  
Andy Murray              63.4%    27.5%      52.0%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Dominic Thiem            60.6%    28.9%      50.8%  
Roberto Bautista Agut    55.9%    32.5%      49.5%  
Pablo Cuevas             57.9%    28.9%      47.8%  
Richard Gasquet          56.0%    29.0%      47.5%  
Novak Djokovic           56.0%    26.8%      47.3%  
Andre Agassi             54.3%    31.4%      47.1%  
Gilles Simon             55.7%    28.4%      46.7%  
Kei Nishikori            52.2%    30.8%      45.2%  
David Ferrer             46.9%    28.2%      41.0%  
Rafael Nadal             42.8%    27.1%      38.8%  
Diego Schwartzman        39.5%    25.8%      35.0%

At the risk of belaboring the point, this table shows just how massive the difference is between the biggest servers and their opposites. Karlovic’s serve accounts for nearly three-quarters of his success on service points, while Schwartzman’s can be credited with barely one-third. Even those numbers don’t tell the whole story: Because Ivo’s game relies so much more on service games than Diego’s does, it means that 54% of Karlovic’s total points won–serve and return–are due to his serve, while only 20% of Schwartzman’s are.

We didn’t need a lengthy analysis to show us that the serve is important in men’s tennis, or that it represents a much bigger chunk of some players’ success than others. But now, instead of asserting a vague truism–the serve is a big deal–we can begin to understand just how much it influences results, and how much weak-serving players need to compensate just to stay even with their more powerful peers.

Just How Aggressive is Jelena Ostapenko?

Italian translation at settesei.it

If you picked up only two stats about surprise Roland Garros champion Jelena Ostapenko, you probably heard that, first, her average forehand is faster than Andy Murray’s, and second, she hit 299 winners in her seven French Open matches. I’m not yet sure how much emphasis we should put on shot speed, and I instinctively distrust raw totals, but even with those caveats, it’s hard not to be impressed.

Compared to the likes of Simona Halep, Timea Bacsinszky, and Caroline Wozniacki, the last three women she upset en route to her maiden title, Ostapenko was practically playing a different game. Her style is more reminiscent of fellow Slam winners Petra Kvitova and Maria Sharapova, who don’t construct points so much as they destruct them. What I’d like to know, then, is how Ostapenko stacks up against the most aggressive players on the WTA tour.

Thankfully we already have a metric for this: Aggression Score, which I’ll abbreviate as AGG. This stat requires that we know three things about every point: How many shots were hit, who won it, and how. With that data, we figure out what percentage of a player’s shots resulted in winners, unforced errors, or her opponent’s forced errors. (Technically, the denominator is “shot opportunities,” which includes shots a player didn’t manage to hit after her opponent hit a winner. That doesn’t affect the results too much.) For today’s purposes, I’m calculating AGG without a player’s serves–both aces and forced return errors–so we’re capturing only rally aggression.

The typical range of this version AGG is between 0.1–very passive–and 0.3–extremely aggressive. Based on the nearly 1,600 women’s matches in the Match Charting Project dataset, Kvitova and Julia Goerges represent the aggressive end, with average AGGs around .275. We only have four Samantha Crawford matches in the database, but early signs suggest she could outpace even those women, as her average is at .312. At the other end of the spectrum, Madison Brengle is at 0.11, with Wozniacki and Sara Errani at 0.12. In the Match Charting data, there are single-day performances that rise as high as 0.44 (Serena Williams over Errani at the 2013 French Open) and fall as low as 0.06. In the final against Ostapenko, Halep’s aggression score was 0.08, half of her average of 0.16.

Context established, let’s see where Ostapenko fits in, starting with the Roland Garros final. Against Halep, her AGG was a whopping .327. That’s third highest of any player in a major final, behind Kvitova at Wimbledon in 2014 (.344) and Serena at the 2007 Australian Open (.328). (We have data for every Grand Slam final back to 1999, and most of them before that.) Using data from IBM Pointstream, which encompasses almost all matches at Roland Garros this year, Ostapenko’s aggression in the final was 7th-highest of any match in the tournament–out of 188 player-matches with the necessary data–behind two showings from Bethanie Mattek Sands, one each from Goerges, Madison Keys, and Mirjana Lucic … and Ostapenko’s first-round win against Louisa Chirico. It was also the third-highest recorded against Halep out of more than 200 Simona matches in the Match Charting dataset.

You get the picture: The French Open final was a serious display of aggression, at least from one side of the court. That level of ball-bashing was nothing new for the Latvian, either. We have charting data for her last three matches at Roland Garros, along with two matches from Charleston and one from Prague this clay season. Of those six performances, Ostapenko’s lowest AGG was .275, against Wozniacki in the Paris quarters. Her average across the six was .303.

If those recent matches indicate what we’ll see from her in the future, she will likely score as the most aggressive rallying player on the WTA tour. Because she played less aggressively in her earlier matches on tour, her career average still trails those of Kvitova and Goerges, but not by much–and probably not for long. It’s scary to consider what might happen as she gets stronger; we’ll have to wait and see how her tactics evolve, as well.

—

The Match Charting Project contains at least 15 matches on 62 different players–here is the rally-only aggression score for all of them:

PLAYER                    MATCHES  RALLY AGG  
Julia Goerges                  15      0.277  
Petra Kvitova                  57      0.277  
Jelena Ostapenko               17      0.271  
Madison Keys                   35      0.261  
Camila Giorgi                  17      0.257  
Sabine Lisicki                 19      0.246  
Caroline Garcia                15      0.242  
Coco Vandeweghe                17      0.238  
Serena Williams               108      0.237  
Laura Siegemund                19      0.235  
Anastasia Pavlyuchenkova       17      0.230  
Danka Kovinic                  15      0.223  
Kristina Mladenovic            28      0.222  
Na Li                          15      0.218  
Maria Sharapova                73      0.217  
                                              
PLAYER                    MATCHES  RALLY AGG  
Eugenie Bouchard               52      0.214  
Ana Ivanovic                   46      0.211  
Garbine Muguruza               57      0.210  
Lucie Safarova                 29      0.209  
Karolina Pliskova              42      0.207  
Elena Vesnina                  20      0.207  
Venus Williams                 46      0.205  
Johanna Konta                  31      0.205  
Monica Puig                    15      0.203  
Dominika Cibulkova             38      0.198  
Martina Navratilova            25      0.197  
Steffi Graf                    39      0.196  
Anastasija Sevastova           17      0.194  
Samantha Stosur                19      0.193  
Sloane Stephens                15      0.190  
                                              
PLAYER                    MATCHES  RALLY AGG  
Ekaterina Makarova             23      0.189  
Lauren Davis                   16      0.186  
Heather Watson                 16      0.185  
Daria Gavrilova                20      0.183  
Justine Henin                  28      0.183  
Kiki Bertens                   15      0.181  
Monica Seles                   18      0.179  
Svetlana Kuznetsova            28      0.174  
Timea Bacsinszky               28      0.174  
Victoria Azarenka              55      0.170  
Andrea Petkovic                24      0.166  
Roberta Vinci                  23      0.164  
Barbora Strycova               16      0.163  
Belinda Bencic                 31      0.163  
Jelena Jankovic                24      0.162  
                                              
PLAYER                    MATCHES  RALLY AGG  
Alison Riske                   15      0.161  
Angelique Kerber               83      0.161  
Flavia Pennetta                23      0.160  
Simona Halep                  218      0.160  
Carla Suarez Navarro           31      0.159  
Martina Hingis                 15      0.157  
Chris Evert                    20      0.152  
Darya Kasatkina                18      0.148  
Elina Svitolina                46      0.141  
Yulia Putintseva               15      0.137  
Alize Cornet                   18      0.136  
Agnieszka Radwanska            90      0.130  
Annika Beck                    16      0.126  
Monica Niculescu               25      0.124  
Caroline Wozniacki             62      0.122  
Sara Errani                    23      0.121

(A few of the match counts differ slightly from what you’ll find on the MCP home page. I’ve thrown out a few matches with too much missing data or in formats that didn’t play nice with the script I wrote to calculate aggression score.)

Benchmarks for Shot-by-Shot Analysis

Italian translation at settesei.it

In my post last week, I outlined what the error stats of the future may look like. A wide range of advanced stats across different sports, from baseball to ice hockey–and increasingly in tennis–follow the same general algorithm:

Classify events (shots, opportunities, whatever) into categories;
Establish expected levels of performance–often league-average–for each category;
Compare players (or specific games or tournaments) to those expected levels.

The first step is, by far, the most complex. Classification depends in large part on available data. In baseball, for example, the earliest fielding metrics of this type had little more to work with than the number of balls in play. Now, batted balls can be categorized by exact location, launch angle, speed off the bat, and more. Having more data doesn’t necessarily make the task any simpler, as there are so many potential classification methods one could use.

The same will be true in tennis, eventually, when Hawkeye data (or something similar) is publicly available. For now, those of us relying on public datasets still have plenty to work with, particularly the 1.6 million shots logged as part of the Match Charting Project.*

*The Match Charting Project is a crowd-sourced effort to track professional matches. Please help us improve tennis analytics by contributing to this one-of-a-kind dataset. Click here to find out how to get started.

The shot-coding method I adopted for the Match Charting Project makes step one of the algorithm relatively straightforward. MCP data classifies shots in two primary ways: type (forehand, backhand, backhand slice, forehand volley, etc.) and direction (down the middle, or to the right or left corner). While this approach omits many details (depth, speed, spin, etc.), it’s about as much data as we can expect a human coder to track in real-time.

For example, we could use the MCP data to find the ATP tour-average rate of unforced errors when a player tries to hit a cross-court forehand, then compare everyone on tour to that benchmark. Tour average is 10%, Novak Djokovic‘s unforced error rate is 7%, and John Isner‘s is 17%. Of course, that isn’t the whole picture when comparing the effectiveness of cross-court forehands: While the average ATPer hits 7% of his cross-court forehands for winners, Djokovic’s rate is only 6% compared to Isner’s 16%.

However, it’s necessary to take a wider perspective. Instead of shots, I believe it will be more valuable to investigate shot opportunities. That is, instead of asking what happens when a player is in position to hit a specific shot, we should be figuring out what happens when the player is presented with a chance to hit a shot in a certain part of the court.

This is particularly important if we want to get beyond the misleading distinction between forced and unforced errors. (As well as the line between errors and an opponent’s winners, which lie on the same continuum–winners are simply shots that were too good to allow a player to make a forced error.) In the Isner/Djokovic example above, our denominator was “forehands in a certain part of the court that the player had a reasonable chance of putting back in play”–that is, successful forehands plus forehand unforced errors. We aren’t comparing apples to apples here: Given the exact same opportunities, Djokovic is going to reach more balls, perhaps making unforced errors where we would call Isner’s mistakes forced errors.

Outcomes of opportunities

Let me clarify exactly what I mean by shot opportunities. They are defined by what a player’s opponent does, regardless of how the player himself manages to respond–or if he manages to get a racket on the ball at all. For instance, assuming a matchup between right-handers, here is a cross-court forehand:

Player A, at the top of the diagram, is hitting the shot, presenting player B with a shot opportunity. Here is one way of classifying the outcomes that could ensue, together with the abbreviations I’ll use for each in the charts below:

player B fails to reach the ball, resulting in a winner for player A (vs W)
player B reaches the ball, but commits a forced error (FE)
player B commits an unforced error (UFE)
player B puts the ball back in play, but goes on to lose the point (ip-L)
player B puts the ball back in play, presents player A with a “makeable” shot, and goes on to win the point (ip-W)
player B causes player A to commit a forced error (ind FE)
player B hits a winner (W)

As always, for any given denominator, we could devise different categories, perhaps combining forced and unforced errors into one, or further classifying the “in play” categories to identify whether the player is setting himself up to quickly end the point. We might also look at different categories altogether, like shot selection.

In any case, the categories above give us a good general idea of how players respond to different opportunities, and how those opportunities differ from each other. The following chart shows–to adopt the language of the example above–player B’s outcomes based on player A’s shots, categorized only by shot type:

The outcomes are stacked from worst to best. At the bottom is the percentage of opponent winners (vs W)–opportunities where the player we’re interested in didn’t even make contact with the ball. At the top is the percentage of winners (W) that our player hit in response to the opportunity. As we’d expect, forehands present the most difficult opportunities: 5.7% of them go for winners and another 4.6% result in forced errors. Players are able to convert those opportunities into points won only 42.3% of the time, compared to 46.3% when facing a backhand, 52.5% when facing a backhand slice (or chip), and 56.3% when facing a forehand slice.

The above chart is based on about 374,000 shots: All the baseline opportunities that arose (that is, excluding serves, which need to be treated separately) in over 1,000 logged matches between two righties. Of course, there are plenty of important variables to further distinguish those shots, beyond simply categorizing by shot type. Here are the outcomes of shot opportunities at various stages of the rally when the player’s opponent hits a forehand:

The leftmost column can be seen as the results of “opportunities to hit a third shot”–that is, outcomes when the serve return is a forehand. Once again, the numbers are in line with what we would expect: The best time to hit a winner off a forehand is on the third shot–the “serve-plus-one” tactic. We can see that in another way in the next column, representing opportunities to hit a fourth shot. If your opponent hits a forehand in play for his serve-plus-one shot, there’s a 10% chance you won’t even be able to get a racket on it. The average player’s chances of winning the point from that position are only 38.4%.

Beyond the 3rd and 4th shot, I’ve divided opportunities into those faced by the server (5th shot, 7th shot, and so on) and those faced by the returner (6th, 8th, etc.). As you can see, by the 5th shot, there isn’t much of a difference, at least not when facing a forehand.

Let’s look at one more chart: Outcomes of opportunities when the opponent hits a forehand in various directions. (Again, we’re only looking at righty-righty matchups.)

There’s very little difference between the two corners, and it’s clear that it’s more difficult to make good of a shot opportunity in either corner than it is from the middle. It’s interesting to note here that, when faced with a forehand that lands in play–regardless of where it is aimed–the average player has less than a 50% chance of winning the point. This is a confusing instance of selection bias that crops up occasionally in tennis analytics: Because a significant percentage of shots are errors, the player who just placed a shot in the court has a temporary advantage.

Next steps

If you’re wondering what the point of all of this is, I understand. (And I appreciate you getting this far despite your reservations.) Until we drill down to much more specific situations–and maybe even then–these tour averages are no more than curiosities. It doesn’t exactly turn the analytics world upside down to show that forehands are more effective than backhand slices, or that hitting to the corners is more effective than hitting down the middle.

These averages are ultimately only tools to better quantify the accomplishments of specific players. As I continue to explore this type of algorithm, combined with the growing Match Charting Project dataset, we’ll learn a lot more about the characteristics of the world’s best players, and what makes some so much more effective than others.