Surface Speed Convergence Revisited

Grass courts before the convergence

For more than a decade, players and pundits have complained that surface speeds are converging. To oversimplify their gripes: Everything is turning into clay. Hard courts have gotten slower, even many of the indoor ones. Grass courts, once a bastion of quick-fire attacking tennis, have slowed down as well.

I’ve attempted to confirm or refute the notion a couple of times. In 2013, I used break rate and ace rate to see whether hard and clay courts were getting closer to each other. The results said no. Many readers complained that I was using the wrong metrics: rally length is a better indicator. I agree, but rally length wasn’t widely available at the time.

In 2016, I looked at rally length for grand slam finals and found some evidence of surface speed convergence. The phenomenon was much clearer in men’s tennis than women’s, a hint that it wasn’t all about the surface, but that tactics had changed and that the mix of players in slam finals skewed the data.

Now, the Match Charting Project contains shot-by-shot logs of more than 12,000 matches. We can always dream of more and better data, but we’re well past the point where we can take a more detailed look at how rally length has changed over the years on different surfaces.

Forecasting rally length

Start with a simple model to forecast rally length for a single match. You don’t need much, just the average rally length for each player, plus the surface. Men who typically play short points have more influence on rally length than those who play long ones. (This is worthy of a blog post of its own–maybe another day.) Call the average rally length of the shorter-point guy X and the average rally length of the longer-point guy Y.

Using data from the last seven-plus seasons, you can predict the rally length of a hard court match as follows:

  • X + (0.7 * Y) – 2.6

The numbers change a bit depending on gender and time span, but the general idea is always the same. The short-point player usually has about half-again as much influence on rally length than his or her opponent.

For men since 2016, we can get the clay court rally length by adding 0.16 to the result above. For grass courts, subtract 0.45 instead.

For example, take a hypothetical matchup between Carlos Alcaraz and Alexander Bublik. In charted matches, Alcaraz’s average rally length is 4.0 and Bublik’s is 3.2. The formula above predicts the following number of shots per point:

  • Hard: 3.39
  • Clay: 3.55
  • Grass: 2.94

The error bars on the surface adjustments are fairly wide, for all sorts of reasons. Courts are not identical just because their surfaces are given the same names. Other factors, like balls, influence how a match goes on a given day. Players adapt differently to changing surfaces. The usual dose of randomness adds even more variance to rally-length numbers.

Changing coefficients

These surface adjustments aren’t very big. A difference of 0.16 shots per point is barely noticeable, unless you’re keeping score. Given the variation within each surface, it means that rallies would be longer on some hard courts than some clay courts, even for the same pair of players.

That brings us back to the issue of surface speed convergence. 0.16 shots per point is my best attempt at quantifying the difference between hard courts and clay courts now–or, more precisely, for men between 2016 and the present. If surfaces have indeed converged, we would find a more substantial gap in older data.

That’s exactly what we see. I ran the same analysis for three other time periods: 1959-95, 1996-2005, and 2006-2015. The following graph shows the rally-length gap between surfaces for each of the four spans:

For example, in the years up to 1995, a pair of players who averaged 4 shots per point on a hard court would be expected to last 5 shots per point (4 + 1) on clay. They’d tally just 3.25 shots per point (4 – 0.75) on grass.

By the years around the turn of the century, the gap between hard courts and grass courts had narrowed to its present level. But the difference between hard and clay continued to shrink. The current level of 0.16 additional shots per point is only about one-sixth as much as the equivalent in the 1980s and early 1990s.

The graph implies that hard courts are constant over time. That’s just an artifact of how I set up this analysis, and it may not be true. It could be that clay courts have been more consistent, something that my earlier analysis suggested and that many insiders seem to believe. In that case, rather than a downward-sloping clay line and an upward-sloping grass line, the graph would show two upward-sloping lines reflecting longer rallies on non-clay surfaces.

Women, too

The women’s game has evolved somewhat differently than the men’s has, but the trends are broadly similar. Here is the same graph for women’s rally lengths across surfaces:

For the last two decades, there has been essentially no difference in point length between hard courts and clay courts. A gap remains between hard and grass, though like in the men’s game, it is trending slightly downwards.

Why the convergence?

The obvious culprit here is the literal one: the surface. Depending on who you ask, tournament directors have chosen to slow down hard and grass surfaces because fans prefer longer rallies, because the monster servers of the turn of the century were boring, because slow surfaces favored the Big Four, or because they like seeing players puke on court after five hours of grueling tennis.

That’s probably part of it.

I would offer a complementary story. Racket technology and the related development of return skill essentially killed serve-and-volley tennis. Slower surfaces would have aided that process, but they weren’t necessary. In the 1980s, a top player like Ivan Lendl or Mats Wilander would use entirely different tactics depending on the surface, grinding on clay while serve-and-volleying indoors and on grass. Now, a Djokovic-Alcaraz match is roughly the same beast no matter the venue. If Alcaraz serve-and-volleyed on every point, Novak would have a far easier time competing on return points than the opponents of Lendl and Wilander ever did.

My best guess is that rally lengths have converged because of some combination of the two. I believe that conditions (surfaces, balls, etc) are the lesser of the two factors. But I don’t know how we could use the data we have to prove it either way.

In the end, it doesn’t particularly matter why. Much more than in my previous studies, we have enough rally-length data to see how players cope with different surfaces. The evidence is strong that, for whatever reason, hard-court tennis, clay-court tennis, and grass-court tennis are increasingly similar, a trend that began at least 25 to 30 years ago and shows no sign of reversing. Whether or not surfaces have converged, tactics have definitely done so.

* * *

Subscribe to the blog to receive each new post by email:

 

Are American Players Screwed Once You Drag Them Into a Rally?

Long after retiring from tennis, Marat Safin remains quotable. The Russian captain at the ATP Cup had this to say to his charge, Karen Khachanov, during a match against Taylor Fritz:

This isn’t exactly testable. I don’t know you’d quantify “shock-and-awe,” or how to identify–let alone measure–attempts to scare one’s opponent. Or screwed-ness, for that matter. But if we take “screwed” to mean the same as “not very likely to win,” we’ve got something we can check.

Many fans would agree with the general claim that American men tend to have big serves, aggressive game styles, and not a whole lot of subtlety. Certainly John Isner fits that mold, and Sam Querrey doesn’t deviate much from it. While Fritz is a big hitter who racks up his share of aces and second-shot putaways, his style isn’t so one-dimensional.

Taylor Fritz: not screwed

Using data from the Match Charting Project, I calculated some rally-length stats for the 70 men with at least 20 charted matches in the last decade. That includes five Americans (Fritz, Isner, Querrey, Steve Johnson, and Jack Sock) and most of the other guys we think of as ATP tour regulars.

Safin’s implied definition is that rallies of four shots or fewer are “shock-and-awe” territory, points that are won or lost within either player’s first two shots. Longer rallies are, supposedly, the points where the Americans lose the edge.

That is certainly the case for Isner. He wins only 40% of points when the rally reaches a fifth shot, by far the worst of these tour regulars. Compared to Isner, even Nick Kyrgios (44%) and Ivo Karlovic (45%) look respectable. The range of winning percentages extends as high as 56%, the mark held by Nikoloz Basilashvili. Rafael Nadal is, unsurprisingly, right behind him in second place at 54%, a whisker ahead of Novak Djokovic.

Fritz, at 50.2%, ranks 28th out of 70, roughly equal to the likes of Gael Monfils, Roberto Bautista Agut, and Dominic Thiem. Best of all–if you’re a contrarian like me, anyway–is that Fritz is almost 20 places higher on the list than Khachanov, who wins 48.5% of points that last five shots or more.

More data

Here are 20 of the 70 players, including some from the top and bottom of the list, along with all the Americans and some other characters of interest. I’ve calculated each player’s percentage of points won for 1- or 2-shot rallies (serve and return winners), 3- or 4-shot rallies (serve- and return-plus-one points), and 5- or more-shot rallies. They are ranked by the 5- or more-shot column:

Rank  Player                 1-2 W%  3-4 W%  5+ W%  
1     Nikoloz Basilashvili    43.7%   54.1%  55.8%  
2     Rafael Nadal            52.7%   51.6%  54.3%  
3     Novak Djokovic          51.8%   54.6%  54.0%  
4     Kei Nishikori           45.5%   51.2%  53.9%  
11    Roger Federer           52.9%   54.9%  52.1%  
22    Philipp Kohlschreiber   50.1%   50.1%  50.7%  
28    Taylor Fritz            51.1%   47.2%  50.2%  
30    Jack Sock               49.0%   46.5%  50.2%  
31    Alexander Zverev        52.8%   50.3%  50.0%  
32    Juan Martin del Potro   53.8%   49.1%  50.0%  
34    Andy Murray             54.3%   49.5%  49.4%  
39    Daniil Medvedev         53.9%   50.4%  49.0%  
43    Stefanos Tsitsipas      51.4%   50.5%  48.6%  
44    Karen Khachanov         53.7%   48.1%  48.5%  
48    Steve Johnson           49.2%   48.8%  48.3%  
61    Sam Querrey             53.5%   48.0%  46.2%  
62    Matteo Berrettini       53.6%   49.3%  46.1%  
66    Ivo Karlovic            51.8%   43.9%  44.9%  
68    Nick Kyrgios            54.6%   47.4%  44.2%  
70    John Isner              52.3%   48.3%  40.2%

Fritz is one of the few players who win more than half of the shortest rallies and more than half of the longest ones. The first category can be the result of a strong serve, as is probably the case with Fritz, and is definitely the case with Isner. But you don’t have to have a big serve to win more than half of the 1- or 2-shot points. Nadal and Djokovic do well in that category (like they do in virtually all categories) in large part because they negate the advantage of their opponents’ serves.

Shifting focus from the Americans for a moment, you might be surprised by the players with positive winning percentages in all three categories. Nadal, Djokovic, and Roger Federer all make the cut, each with plenty of room to spare. The remaining two are the unexpected ones. Philipp Kohlschreiber is just barely better than neutral in both classes of short points, and a bit better than that (50.7%) on long ones. And Alexander Zverev qualifies by the skin of his teeth, winning very slightly more than half of his long rallies. (Yes, that 50.0% is rounded down, not up.) Match Charting Project data is far from complete, so it’s possible that with a different sample, one or both of the Germans would fall below the 50% mark, but the numbers for both are based on sizable datasets.

Back to Fritz, Isner, and company. Safin may be right that the Americans want to scare you with a couple of big shots. Isner has certainly intimidated his share of opponents with the serve alone. Yet Fritz, the player who prompted the comment, is more well-rounded than the Russian captain gave him credit for. Khachanov won the match on Sunday, and at least at this stage in their careers, the Russian is the better player. But not on longer rallies. Based on our broader look at the data, it’s Khachanov who should try to avoid getting dragged into long exchanges, not Fritz.

Match Charting Project Rally Stats: Glossary

I’m in the process of rolling out more stats based on Match Charting Project data across Tennis Abstract. This is one of several glossaries intended to explain those stats and point interested visitors to further reading.

At the moment, the following rally stats can be seen at a variety of leaderboards.

  • RallyLen – Average rally length. Not everyone counts shots exactly the same way, so I try to follow the closest thing there is to a consensus. The serve counts as a shot, but errors do not. Thus, a double fault is 0 shots, and an ace or unreturned serve is 1. A rally with a serve, four additional shots, and an error on an attempted sixth shot counts as 5.
  • RLen-Serve – Average rally length on service points.
  • RLen-Return – Average rally length on return points.
  • 1-3 W% – Winning percentage on points between one and three shots, inclusive. On the match-specific pages for each charted match, you can see winning percentages broken down by server. Click on “Point outcomes by rally length.”
  • 4-6 W% – Winning percentage on points between four and six shots, inclusive.
  • 7-9 W% – Winning percentage on points between seven and nine shots, inclusive.
  • 10+ W% – Winning percentage on points of ten shots or more.
  • FH/GS – Forehands per groundstroke. This stat counts all baseline shots from the forehand side (including slices, lobs, and dropshots), and divides by all baseline shots, to give an idea of how much each player is favoring the forehand side (or, perhaps, is pushed to one side by his or her opponent’s tactics).
  • BH Slice% – Backhand slice percentage. Of backhand-side groundstrokes (topspin, slices, dropshots, lobs), the percentage that are slices, including dropshots.
  • FHP/Match – Forehand Potency per match. FHP and BHP (Backhand Potency) are stats I invented to measure the effectiveness of particular groundstrokes. It adds, roughly, one point for a winner and one half point for the shot before a winner, and subtracts one point for an unforced error. On a per-match basis, the stat is influenced by the length of the match and the number of shots hit. Because each point can be counted 1.5 times in FHP (one for a forehand winner, one-half for a forehand that set it up), divide by 1.5 for a number of points that the forehand contributed to the match, above or below average. For instance, a FHP of +6 suggests that the player won 4 more points than he or she would have with a neutral forehand.
  • FHP/100 – Forehand potency per 100 forehands. The rate-stat version of FHP allows us to compare stats from different match lengths.
  • BHP/Match – Backhand Potency per match. Same as FHP, but for topspin backhands. I’ve occasionally calculated backhand-slice potency as well, but slices are not included in BHP itself.
  • BHP/100 – Backhand potency per 100 backhands. The rate-stat version of BHP.

Do Rallies Get Longer as Matches Progress?

Italian translation at settesei.it

Yesterday at the New York Open, Paolo Lorenzi battled through three sets to defeat Ryan Harrison. It was a notable result for a number of reasons, starting with the fact that Lorenzi is rarely seen on a hard court when there’s any other option. The 37-year-old Italian is one of the many men defying the aging curve these days, and with the victory, he’ll play at least one tour-level quarter-final for the eighth year in a row, despite not reaching his first until he was 30.

The way in which Lorenzi won the match was almost as unique as his career trajectory. Take a look at the average rally length per set:

Set  Avg Rally  
1          3.2  
2          4.0  
3          4.9

You probably don’t need me to tell you which set Harrison won. The opening frame was serve-dominated, typical of American indoor hard court events. As the match progressed, the points increasingly resembled the clay-court sparring that Lorenzi surely would have preferred.

Theorizing

The Lorenzi-Harrison match was extreme, but it tracks with what I believe to be the conventional wisdom. Throughout a match, players get better at reading their opponents’ games, cutting down on unreturned serves and making it more likely that each point will turn into a more protracted exchange. That’s the theory, anyway. There are some countervailing forces, such as fatigue, which work in the other direction, but in general we expect points to get longer.

Yesterday’s contest didn’t exactly follow that script, though. The rallies might have gotten longer because the two men better predicted each other’s shots, but it doesn’t show up so neatly in aces–Harrison hit aces on between 18% of 21% of his points in each set–or the more inclusive category of unreturned serves:

Set  Points  Unret%  
1        47   42.6%  
2        65   32.3%  
3        73   37.0%

While serve recognition may explain the rally length jump from set 1 to set 2, it goes in the opposite direction from set 2 to set 3. Yes, these are small samples, and yes, unreturned serves don’t tell the whole story. But there are signs that our initial theory is missing something.

More matches

As interesting as Lorenzi is, we’re going to need more players, and more data, to better understand what happens to serve returns and rally length over the course of a match. Let’s start with the main draw singles matches from the 2019 Australian Open. Not only are there are a lot of them, but since they are best of five, we have an opportunity to see how these trends unfold over several sets per match.

For each match, I measured the average rally length and rate of unreturned serves for each set, and then made set-by-set comparisons for the length of the match. For instance, in Lorenzi-Harrison, rally length increased by 25% from set 1 to set 2. Then, for each set, I aggregated all the matches of sufficient length to figure out how much the tour as a whole was changing from one set to the next.

The results are considerably less eye-catching than those of the Lorenzi match. In the following table, the “Avg Rally” and “Unret%” columns show the change in ratio form: If the baseline rate in the first set is 1.0, the rally length in set 2 increases by 0.8% and the number of unreturned serves goes up by 2.4%. I’ve also included example columns, showing realistic rally lengths and unreturned-serve rates for each set based on tournament averages of 3.2 shots by point and 34% of serves unreturned:

Set  Avg Rally  Ex Rally  Unret%  Ex Unret  
1            1      3.20       1     34.0%  
2        1.008      3.23   1.024     34.8%  
3        1.019      3.26   1.033     35.1%  
4        0.987      3.16   1.155     39.3%  
5        1.021      3.27   1.144     38.9% 

The set-to-set differences in rally length are barely enough to qualify for the name. The shift in the rate of unreturned serves, however, is much more striking, all the more so because it moves in the opposite direction that we expected.* Perhaps fatigue–or strategic energy conservation–plays a bigger role than I thought, or servers gain more from familiarity with their opponent than returners do.

* You might wonder if the effect is an artifact of the data, that players who reach 4th and 5th sets are bigger servers. That may be true, but it’s not what we’re seeing here. I’m comparing the stats in each set to the previous set in the match itself, and then averaging the set-to-set changes, weighted by the number of points in the sets. A John Isner 5th set, then, is compared only to an Isner 4th set.

WTA to the rescue

The results are completely different for women. Here is the same data for the 127 main draw women’s singles matches at the Australian Open:

Set  Avg Rally  Ex Rally  Unret%  Ex Unret  
1            1      3.40       1     27.0%  
2        1.035      3.52   0.974     26.3%  
3        1.103      3.75   0.915     24.7%

Still not as dramatic as Harrison-Lorenzi, but the trends are more marked than for the men. The number of unreturned serves drops quite a bit, and rally length increases by an amoun that an attentive spectator might notice. Those two are related–if there are fewer unreturned serves, there are more shots per point, even if we only consider the second shot. Beyond that, there are more opportunities for longer exchanges. In any case, the set-by-set trends for women fit closer to the intial theory than the men’s results did.

As with every aggregate stat, I’m guessing that there is a huge amount of variation among players. Perhaps players who are particularly good in third sets really do return more serves or, as Lorenzi did, shift their tactics in the direction of a more favorable style of play. Looking at these types of numbers for individual competitors is a reasonable next step, but it’s one that will need to wait for another day.

Is Doubles As Entertaining As We Think?

For as long as I’ve been following tennis, there’s been a tension between the amount of doubles available to watch and the amount of doubles that fans say they want to watch. In-person spectators flock to doubles matches at grand slams and aficionados pass around GIFs of the most outrageous, acrobatic doubles points. Yet broadcasters almost always stick with singles, leaving would-be viewers chasing down online streams, often illegal ones.

There are some good reasons for that, foremost among them the marquee drawing power of the best singles players. Broadcasters are convinced that their audiences would rather watch a Fed/Rafa/Serena/Pova blowout than a potentially more entertaining one-on-one contest between unknowns, let alone a doubles match. And they’re probably right–at least, they’ve got ratings numbers to back them up. So we’re left with a small population of hipster doubles fans, confident that two-on-two is the good stuff, even if most of us rarely watch it.

It’s probably impossible to quantify entertainment value, but that doesn’t mean we shouldn’t try. What can the numbers tell us about the watchability of doubles?

Hip to be rectangular

There’s plenty of room for a diversity of preferences–one fan’s Monfils may be another fan’s Isner. But there are some general principles that seem to define entertaining tennis for most spectators. Winners are better than errors, for one. Long rallies are better than short ones, at least within reason. And you can never go wrong with more net play.

If net play were the only criterion, doubles would beat singles easily. But what about other factors? I started wondering about this while researching a recent post on gender differences in mixed doubles, when I came across a match in which every rally was four shots or fewer. For every brilliant reflex half-volley, doubles features a hefty dose of big serving and tactically high-risk returning. Especially in men’s doubles, that translates into a lot of team conferences and not very much shotmaking.

Let’s see some numbers. For each of the five main events at the 2019 Australian Open–men’s and women’s singles, men’s and women’s doubles, and mixed doubles–here is the average rally length, the percentage of points ended in three shots or less, and the percentage of points that required at least ten shots:

Event            Avg Rally  <3 Shots  10+ Shots  
Men's Singles          3.2     72.6%       5.1%  
Women's Singles        3.4     67.9%       5.4%  
Men's Doubles          2.5     81.6%       1.1%  
Women's Doubles        2.9     76.7%       2.4%  
Mixed Doubles          2.8     74.0%       1.8%

There's a family resemblance in these numbers, but it's clear that doubles points are shorter. Men's doubles is the most extreme, at 2.5 shots per point. By comparison, only 8% of the men's singles matches in the Match Charting Project database have an average rally length lower than that. More than four out of every five men's doubles points ends by the third shot, and with barely one in one hundred points lasting to ten shots, you'd be lucky to sit through an entire match and see more than one such exchange.

Quantity and quality

Shorter points are the nature of the format. Even recreational players can find it hard to keep the ball in play when half of each team is patrolling the net, looking for an easy putaway. Short-rally tennis can still be entertaining, as long as the quality of play offsets the unfavorable watching-to-waiting ratio.

I've mentioned my perception that men's doubles features a lot of unreturned serves. The numbers suggest that I spoke too soon. For the five events, here are the percentage of points in which the return doesn't come back in play:

Event            Unret%  
Men's Singles     31.7%  
Women's Singles   24.3%  
Men's Doubles     32.1%  
Women's Doubles   21.6%  
Mixed Doubles     29.3%

For men, singles and doubles are about the same. Perhaps the singles servers are a bit stronger, but the doubles returners are taking more chances, trying to avoid feeding weak returns to aggressive netmen. With women, you're more likely to see a return in play in a doubles match than in singles. Unless you're a connoisseur of powerful serves, you'll probably find higher rates of returns in play to be more enjoyable to watch.

The same applies to winners, compared to unforced errors. (Forced errors are a bit tricky--sometimes they are as exciting and indicative of quality as a winner; other times they're just an out-of-position unforced error.) Let's see what fraction of points end in various ways, for each of the five events:

Event            Unforced%  Forced%  Winner%  
Men's Singles        25.6%    16.2%    21.3%  
Women's Singles      28.9%    16.0%    23.4%  
Men's Doubles        12.8%    17.2%    29.9%  
Women's Doubles      20.9%    18.0%    32.1%  
Mixed Doubles        14.5%    17.0%    29.5%

Here, doubles is the clear winner. For both men and women, more doubles points than singles points end in winners, and fewer points end in unforced errors. Some of that reflects the much higher rate of net play, since it's easier to execute an unreturnable shot from just a few feet behind the net. There are a few more forced errors in doubles, perhaps representing failed attempts to handle volleys that almost went for winners, but no matter how we interpret them, the difference in forced errors is not enough to offset the differences in winners and unforced errors.

The hipsters weren't wrong

The numbers aren't as conclusive as I expected them to be. Yes, doubles points are shorter, but not so much so that the format is reduced to only serving and returning. (Though some men's matches are close.) As usual, our data has limitations, but the information available for each point suggests that there's plenty of high-quality, entertaining tennis to be seen on doubles courts, even if it's usually limited to four or five shots at a time.

Petra Kvitova’s Current Status: Low Risk, High Reward

Italian translation at settesei.it

For more a decade, Petra Kvitova has been one of the most aggressive women in tennis. She aims for the corners, hits hard, and lets the chips fall where they may. Sometimes the results are ugly, like a 6-4 6-0 loss to Monica Niculescu in the 2016 Luxembourg final, but when it works, the rewards–two Wimbledon titles, for starters–more than make up for it.

She’s currently riding another wave of winners. Her 11-match win streak–which has involved the loss of only a single set–puts her one more victory away from a third major championship. The 28-year-old Czech has gotten this far by persisting with her big-hitting style, but with a twist: In Melbourne, she’s not missing very often. While she’s ending as many points as ever on her own racket, she’s missing less often than many of her more conservative peers.

In her last five matches at the Australian Open, from the second round through the semi-finals, 7.9% of her shots (including serves) have resulted in unforced errors. In the 88 Petra matches logged by the Match Charting Project, that’s the stingiest five-match stretch of her career. In charted matches since 2010, the average WTA player hits unforced errors on 8.0% of their shots. So Kvitova, the third-most aggressive player on tour, is somehow making errors at a below-average rate. It’s high-risk, high-reward tennis … without the risk.

And it isn’t because her go-for-broke tactics have changed. In Thursday’s semi-final against Danielle Collins, her aggression score–an aggregate measure of point-ending shots including winners, induced forced errors, and unforced errors–was 30.5%, the third-highest of all of her charted matches since her 2017 return to the tour. Her overall aggression score in Melbourne, 28.2%, is also higher than her career average of 27.1%.

In other words, she’s making fewer errors, and the missing errors are turning into point-ending shots in her favor. The following graph shows five-match rolling averages of winners (and induced forced errors) per shot and unforced errors per shot for all charted matches in Kvitova’s career:

Even with the winner and error rates smoothed out by five-match rolling averages, these are still some noisy trend lines. Still, some stories are quite clear. This month, Kvitova is hitting winners at close to her best-ever rate. Her average since the second round in Melbourne has been 20.3%, as high as anything she’s posted before with the exception of her 2014 Wimbledon title. (I’ve never tried to adjust winner totals for surface; it’s possible that the difference can be explained entirely by the grass.)

And most strikingly, this is as big a gap between winner rate and error rate as she’s achieved since her 2014 Wimbledon title run. In fact, between the second round and semi-finals at that tournament, she averaged 8.1% errors and 20.0% winners. Both of her numbers in Australia this year have been a tiny bit better.

Best of all, the error rate has–for the most part–seen a steady downward trend since 2016. The recent error spike is largely due to her three losses in Singapore last October and a bumpy start to this season in Brisbane. We can’t write those off entirely–perhaps Kvitova will always suffer through weeks when her aim goes awry–but she appears to have put them solidly behind her.

None of this is a guarantee that Petra will continue to avoid errors in Saturday’s final against Naomi Osaka. I could’ve written something about her encouraging error rates before the tour finals in Singapore last fall, and she failed to win a round-robin match there. And Osaka is likely to offer a stiffer challenge than any of Kvitova’s previous six opponents in Melbourne, even if her second serve doesn’t. That said, a stingy Kvitova is a terrifying prospect, one with the potential to end the brief WTA depth era and dominate women’s tennis.

Dayana Yastremska Hits Harder Than You

Italian translation at settesei.it

At the 2019 Australian Open, tennis balls have more to fear than ever before. Serena Williams is back and appears to be in top form, Maria Sharapova is playing well enough to oust defending champion Caroline Wozniacki, and Petra Kvitova has followed up her Sydney title with a stress-free jaunt through the first three rounds.

And then there are the youngsters. Hyper-aggressive 20-year-old Aryna Sabalenka crashed out in the third round against an even younger threat, Amanda Anisimova. But still in the draw, facing Serena on Saturday, is the hardest hitter of all, 18-year-old Ukrainian Dayana Yastremska. Watch a couple of Sabalenka matches, and you might wonder if we’ve reached the apex of aggression on the tennis court. Nope: Yastremska turns it up to 11.

When Lowell first introduced his aggression score metric a few years ago, Kvitova was the clear leader of the pack, the player who ended points–for good or ill–most frequently with the ball on her racket. Madison Keys wasn’t far behind, with Serena coming in third among the small group of players for which we had sufficient data. Since then, two things have changed: The Match Charting Project now has a lot more data on many more players, and a new generation of ball-bashers has threatened to make the rest of the tour look like weaklings in comparison.

The aggression score metric packs a lot of explanatory power in a simple calculation: It’s the number of point-ending shots (winners, unforced errors, or shots that induce a forced error from the opponent) divided by the number of shot opportunities. The resulting statistic ranges from about 10% at the lower extreme–Sara Errani’s career average is 11.6%–to 30%* at the top end. Individual matches can be even higher or lower, but no player with at least five charted matches sits outside of that range.

* Readers with a keen memory or a penchant for following links will notice that in Lowell’s orignial post, Kvitova’s aggregate score was 33% and Keys was also a tick above 30%. I’m not sure whether those were flukes that have since come back down with larger samples, or whether I’m using a slightly different formula. Either way, the ordering of players has remained consistent, and that’s the important thing.

Here are the top ten most aggressive WTA tour regulars of the 2010s before Sabalenka and Yastremska came along:

Rank  Player                      Agg 
1     Petra Kvitova             27.1%  
2     Julia Goerges             26.8%  
3     Serena Williams           26.8%  
4     Jelena Ostapenko          26.5%  
5     Camila Giorgi             26.0%  
6     Madison Keys              25.9%  
7     Coco Vandeweghe           25.9%  
8     Sabine Lisicki            25.6%  
9     Anastasia Pavlyuchenkova  24.0%  
10    Maria Sharapova           23.2%

All of these women rank among the top 15% of most aggressive players. They end points more frequently on their own racket than plenty of competitors we also consider aggressive, like Venus Williams (21.9%), Karolina Pliskova (21.6%), and Johanna Konta (22.3%). Ostapenko bridges the gap between the two generations; she wasn’t part of the discussion when aggression score was first introduced, though once she started winning matches, it was immediately clear that she’d challenge Kvitova at the top of this list.

Here’s the current top ten:

Rank  Player               Agg  
1     Dayana Yastremska  28.6%  
2     Aryna Sabalenka    27.6%  
3     Petra Kvitova      27.1%  
4     Julia Goerges      26.8%  
5     Serena Williams    26.8%  
6     Jelena Ostapenko   26.5%  
7     Viktoria Kuzmova   26.0%  
8     Camila Giorgi      26.0%  
9     Madison Keys       25.9%  
10    Coco Vandeweghe    25.9%

Yastremska, Sabalenka, and even Viktoria Kuzmova have elbowed their way into the top ten. Yastremska’s and Kuzmova’s places on this list might be a little premature, since their scores are based on only seven and nine matches, respectively. But Sabalenka’s pugnaciousness is well-documented: her Petra-topping score of 27.6% is an average across almost 30 matches.

Tennis tends to swing between extremes, with one generation developing skills to counteract the abilities of the previous one. It’s not yet clear whether the aggression of these young women will catapult them to the top–after all, Sabalenka won only five games today against Anisimova, whose aggression score is a more modestly high 23.0%. Perhaps as they gain experience, they’ll develop more well-rounded games and return Kvitova to her place at the top.

In the meantime, we have the privilege of watching some of the hardest hitters in WTA history battle it out. Tomorrow, Yastremska will contest her first third round at a major in a must-watch match against Serena. There will be fireworks, but it’s safe to say there won’t be much in the way of rallies.

The Federer Backhand That Finally Beat Nadal

Italian translation at settesei.it

Roger Federer and Rafael Nadal first met on court in 2004, and they contested their first Grand Slam final two years later. The head-to-head has long skewed in Rafa’s favor: Entering yesterday’s match, Nadal led 23-11, including 9-2 in majors. Nadal’s defense has usually trumped Roger’s offense, but after a five-set battle in yesterday’s Australian Open final, it was Federer who came out on top. Rafa’s signature topspin was less explosive than usual, and Federer’s extremely aggressive tactics took advantage of the fast conditions to generate one opportunity after another in the deciding fifth set.

In the past, Nadal’s topspin has been particularly damaging to Federer’s one-handed backhand, one of the most beautiful shots in the sport–but not the most effective. The last time the two players met in Melbourne, in a 2014 semifinal the Spaniard won in straight sets, Nadal hit 89 crosscourt forehands, shots that challenges Federer’s backhand, nearly three-quarters of them (66) in points he won. Yesterday, he hit 122 crosscourt forehands, less than half of them in points he won. Rafa’s tactics were similar, but instead of advancing easily, he came out on the losing side.

Federer’s backhand was unusually effective yesterday, especially compared to his other matches against Nadal. It wasn’t the only thing he did well, but as we’ll see, it accounted for more than the difference between the two players.

A metric I’ve devised called Backhand Potency (BHP) illustrates just how much better Fed executed with his one-hander. BHP approximates the number of points whose outcomes were affected by the backhand: add one point for a winner or an opponent’s forced error, subtract one for an unforced error, add a half-point for a backhand that set up a winner or opponent’s error on the following shot, and subtract a half-point for a backhand that set up a winning shot from the opponent. Divide by the total number of backhands, multiply by 100*, and the result is net effect of each player’s backhand. Using shot-by-shot data from over 1,400 men’s matches logged by the Match Charting Project, we can calculate BHP for dozens of active players and many former stars.

* The average men’s match consists of approximately 125 backhands (excluding slices), while Federer and Nadal each hit over 200 in yesterday’s five-setter.

By the BHP metric, Federer’s backhand is neutral: +0.2 points per 100 backhands. Fed wins most points with his serve and his forehand; a neutral BHP indicates that while his backhand isn’t doing the damage, at least it isn’t working against him. Nadal’s BHP is +1.7 per 100 backhands, a few ticks below those of Murray and Djokovic, whose BHPs are +2.6 and +2.5, respectively. Among the game’s current elite, Kei Nishikori sports the best BHP, at +3.6, while Andre Agassi‘s was a whopping +5.0. At the other extreme, Marin Cilic‘s is -2.9, Milos Raonic‘s is -3.7, and Jack Sock‘s is -6.6. Fortunately, you don’t have to hit very many backhands to shine in doubles.

BHP tells us just how much Federer’s backhand excelled yesterday: It rose to +7.8 per 100 shots, a better mark than Fed has ever posted against his rival. Here are his BHPs for every Slam meeting:

Match       RF BHP  
2006 RG      -11.2  
2006 WIMB*    -3.4  
2007 RG       -0.7  
2007 WIMB*    -1.0  
2008 RG      -10.1  
2008 WIMB     -0.8  
2009 AO        0.0  
2011 RG       -3.7  
2012 AO       -0.2  
2014 AO       -9.9  
2017 AO*      +7.8 

* matches won by Federer

Yesterday’s rate of +7.8 per 100 shots equates to an advantage of +17 over the course of his 219 backhands. One unit of BHP is equivalent to about two-thirds of a point of match play, since BHP can award up to a combined 1.5 points for the two shots that set up and then finish a point. Thus, a +17 BHP accounts for about 11 points, exactly the difference between Federer and Nadal yesterday. Such a performance differs greatly from what Nadal has done to Fed’s backhand in the past: On average, Rafa has knocked his BHP down to -1.9, a bit more than Nadal’s effect on his typical opponent, which is a -1.7 point drop. In the 25 Federer-Nadal matches for which the Match Charting Project has data, Federer has only posted a positive BHP five times, and before yesterday’s match, none of those achievements came at a major.

The career-long trend suggests that, next time Federer and Nadal meet, the topspin-versus-backhand matchup will return to normal. The only previous time Federer recorded a +5 BHP or better against Nadal, at the 2007 Tour Finals, he followed it up by falling to -10.1 in their next match, at the 2008 French Open. He didn’t post another positive BHP until 2010, six matches later.

Outlier or not, Federer’s backhand performance yesterday changed history.  Using the approximation provided by BHP, had Federer brought his neutral backhand, Nadal would have won 52% of the 289 points played—exactly his career average against the Swiss—instead of the 48% he actually won. The long-standing rivalry has required both players to improve their games for more than a decade, and at least for one day, Federer finally plugged the gap against the opponent who has frustrated him the most.

Searching For Meaning in Distance Run Stats

Italian translation at settesei.it

For the last couple of years, some tennis broadcasts have featured “distance run” stats, tracking how far each player travels over the course of a point or a match. It’s a natural byproduct of all the cameras pointed at tennis courts. Especially in long rallies, it’s something that fans have wondered about for years.

As is often the case with new metrics, no one seems to be asking whether these new stats mean anything. Thanks to IBM (you never thought I’d write that, did you?), we have more than merely anecdotal data to play with, and we can start to answer that question.

At Roland Garros and Wimbledon this year, distance run during each point was tracked for players on several main courts. From those two Slams, we have point-by-point distance numbers for 103 of the 254 men’s singles matches. A substantial group of women’s matches is available as well, and I’ll look at those in a future post.

Let’s start by getting a feel for the range of these numbers. Of the available non-retirement matches, the shortest distance run was in Rafael Nadal’s first-round match in Paris against Sam Groth. Nadal ran 960 meters against Groth’s 923–the only match in the dataset with a total distance run under two kilometers.

At the other extreme, Novak Djokovic ran 4.3 km in his fourth-round Roland Garros match against Roberto Bautista Agut, who himself tallied a whopping 4.6 km. Novak’s French Open final against Andy Murray is also near the top of the list. The two players totaled 6.7 km, with Djokovic’s 3.4 km edging out Murray’s 3.3 km. Murray is a familiar face in these marathon matches, figuring in four of the top ten. (Thanks to his recent success, he’s also wildly overepresented in our sample, appearing 14 times.)

Between these extremes, the average match features a combined 4.4 km of running, or just over 20 meters per point. If we limit our view to points of five shots or longer (a very approximate way of separating rallies from points in which the serve largely determines the outcome), the average distance per point is 42 meters.

Naturally, on the Paris clay, points are longer and players do more running. In the average Roland Garros match, the competitors combined for 4.8 km per match, compared to 4.1 km at Wimbledon. (The dataset consists of about twice as many Wimbledon matches, so the overall numbers are skewed in that direction.) Measured by the point, that’s 47 meters per point on clay and 37 meters per point on grass.

Not a key to the match

All that running may be necessary, but covering more distance than your opponent doesn’t seem to have anything to do with winning the match. Of the 104 matches, almost exactly half (53) were won by the player who ran farther.

It’s possible that running more or less is a benefit for certain players. Surprisingly, Murray ran less than his opponent in 10 of his 14 matches, including his French Open contests against Ivo Karlovic and John Isner. (Big servers, immobile as they tend to be, may induce even less running in their opponents, since so many of their shots are all-or-nothing. On the other hand, Murray outran another big server, Nick Kyrgios, at Wimbledon.)

We think of physical players like Murray and Djokovic as the ones covering the entire court, and by doing so, they simultaneously force their opponents to do the same–or more. In Novak’s ten Roland Garros and Wimbledon matches, he ran farther than his opponent only twice–in the Paris final against Murray, and in the second round of Wimbledon against Adrian Mannarino. In general, running fewer meters doesn’t appear to be a leading indicator of victory, but for certain players in the Murray-Djokovic mold, it may be.

In the same vein, combined distance run may turn out to be a worthwhile metric. For men who earn their money in long, physical rallies, total distance run could serve as a proxy for their success in forcing a certain kind of match.

It’s also possible that aggregate numbers will never be more than curiosities. In the average match, there was only a 125 meter difference between the distances covered by the two players. In percentage terms, that means one player outran the other by only 5.5%. And as we’ll see in a moment, a difference of that magnitude could happen simply because one player racked up more points on serve.

Point-level characteristics

In the majority of points, the returner does a lot more running than the server does. The server usually forces his opponent to start running first, and in today’s men’s game, the server rarely needs to scramble too much to hit his next shot.

On average, the returner must run just over 10% further. When the first serve is put in play, that difference jumps to 12%. On second-serve points, it drops to 7%.

By extension, we would expect that the player who runs further would, more often than not, lose the point. That’s not because running more is necessarily bad, but because of the inherent server’s advantage, which has the side effect of showing up in the distance run stats as well. That hypothesis turns out to be correct: The player who runs farther in a single point loses the point 56% of the time.

When we narrow our view to only those points with five shots or more, we see that running more is still associated with losing. In these longer rallies, the player who covered more distance loses 58% of the points.

Some of the “extra” running in shorter points can be attributed to returning serve–and thus, we can assume that players are losing points because of the disadvantage of returning, not necessarily because they ran so much. But even in very long rallies of 10 shots or more, the player who runs farther is more likely to lose the point. Even at the level of a single point, my suggestion above, that physical players succeed by forcing opponents to work even harder than they do, seems valid.

With barely 100 matches of data–and a somewhat biased sample, no less–there are only so many conclusions we can draw about distance run stats. Two Grand Slams worth of show court matches is just enough to give us a general context for understanding these numbers and to hint at some interesting findings about the best players. Let’s hope that IBM continues to collect these stats, and that the ATP and WTA follow suit.

The Grass is Slowing: Another Look at Surface Speed Convergence

Italian translation at settesei.it

A few years ago, I posted one of my most-read and most-debated articles, called The Mirage of Surface Speed Convergence.  Using the ATP’s data on ace rates and breaks of serve going back to 1991, it argued that surface speeds aren’t really converging, at least to the extent we can measure them with those two tools.

One of the most frequent complaints was that I was looking at the wrong data–surface speed should really be quantified by rally length, spin rate, or any number of other things. As is so often the case with tennis analytics, we have only so much choice in the matter. At the time, I was using all the data that existed.

Thanks to the Match Charting Project–with a particular tip of the cap to Edo Salvati–a lot more data is available now. We have shot-by-shot stats for 223 Grand Slam finals, including over three-fourths of Slam finals back to 1980. While we’ll never be able to measure anything like ITF Court Pace Rating for surfaces thirty years in the past, this shot-by-shot data allows us to get closer to the truth of the matter.

Sure enough, when we take a look at a simple (but until recently, unavailable) metric such as rally length, we find that the sport’s major surfaces are playing a lot more similarly than they used to. The first graph shows a five-year rolling average* for the rally length in the men’s finals of each Grand Slam from 1985 to 2015:

mens_finals_rallies

* since some matches are missing, the five-year rolling averages each represent the mean of anywhere from two to five Slam finals.

Over the last decade and a half, the hard-court and grass-court slams have crept steadily upward, with average rally lengths now similar to those at Roland Garros, traditionally the slowest of the four Grand Slam surfaces. The movement is most dramatic in the Wimbledon grass, which for many years saw an average rally length of a mere two shots.

For all the advantages of rally length and shot-by-shot data, there’s one massive limitation to this analysis: It doesn’t control for player. (My older analysis, with more limited data per match, but for many more matches, was able to control for player.) Pete Sampras contributed to 15 of our data points, but none on clay. Andres Gomez makes an appearance, but only at Roland Garros. Until we have shot-by-shot data on multiple surfaces for more of these players, there’s not much we can do to control for this severe case of selection bias.

So we’re left with something of a chicken-and-egg problem.  Back in the early 90’s, when Roland Garros finals averaged almost six shots per point and Wimbledon finals averaged barely two shots per point, how much of the difference was due to the surface itself, and how much to the fact that certain players reached the final? The surface itself certainly doesn’t account for everything–in 1988, Mats Wilander and Ivan Lendl averaged over seven shots per point at the US Open, and in 2002, David Nalbandian and Lleyton Hewitt topped 5.5 shots per point at Wimbledon.

Still, outliers and selection bias aside, the rally length convergence we see in the graph above reflects a real phenomenon, even if it is amplified by the bias. After all, players who prefer short points win more matches on grass because grass lends itself to short points, and in an earlier era, “short points” meant something more extreme than it does today.

The same graph for women’s Grand Slam finals shows some convergence, though not as much:

womens_finals_rallies

Part of the reason that the convergence is more muted is that there’s less selection bias. The all-surface dominance of a few players–Chris Evert, Martina Navratilova, and Steffi Graf–means that, if only by historical accident, there is less bias than in men’s finals.

We still need a lot more data before we can make confident statements about surface speeds in 20th-century tennis. (You can help us get there by charting some matches!) But as we gather more information, we’re able to better illustrate how the surfaces have become less unique over the years.