Searching For Meaning in Distance Run Stats

Italian translation at settesei.it

For the last couple of years, some tennis broadcasts have featured “distance run” stats, tracking how far each player travels over the course of a point or a match. It’s a natural byproduct of all the cameras pointed at tennis courts. Especially in long rallies, it’s something that fans have wondered about for years.

As is often the case with new metrics, no one seems to be asking whether these new stats mean anything. Thanks to IBM (you never thought I’d write that, did you?), we have more than merely anecdotal data to play with, and we can start to answer that question.

At Roland Garros and Wimbledon this year, distance run during each point was tracked for players on several main courts. From those two Slams, we have point-by-point distance numbers for 103 of the 254 men’s singles matches. A substantial group of women’s matches is available as well, and I’ll look at those in a future post.

Let’s start by getting a feel for the range of these numbers. Of the available non-retirement matches, the shortest distance run was in Rafael Nadal’s first-round match in Paris against Sam Groth. Nadal ran 960 meters against Groth’s 923–the only match in the dataset with a total distance run under two kilometers.

At the other extreme, Novak Djokovic ran 4.3 km in his fourth-round Roland Garros match against Roberto Bautista Agut, who himself tallied a whopping 4.6 km. Novak’s French Open final against Andy Murray is also near the top of the list. The two players totaled 6.7 km, with Djokovic’s 3.4 km edging out Murray’s 3.3 km. Murray is a familiar face in these marathon matches, figuring in four of the top ten. (Thanks to his recent success, he’s also wildly overepresented in our sample, appearing 14 times.)

Between these extremes, the average match features a combined 4.4 km of running, or just over 20 meters per point. If we limit our view to points of five shots or longer (a very approximate way of separating rallies from points in which the serve largely determines the outcome), the average distance per point is 42 meters.

Naturally, on the Paris clay, points are longer and players do more running. In the average Roland Garros match, the competitors combined for 4.8 km per match, compared to 4.1 km at Wimbledon. (The dataset consists of about twice as many Wimbledon matches, so the overall numbers are skewed in that direction.) Measured by the point, that’s 47 meters per point on clay and 37 meters per point on grass.

Not a key to the match

All that running may be necessary, but covering more distance than your opponent doesn’t seem to have anything to do with winning the match. Of the 104 matches, almost exactly half (53) were won by the player who ran farther.

It’s possible that running more or less is a benefit for certain players. Surprisingly, Murray ran less than his opponent in 10 of his 14 matches, including his French Open contests against Ivo Karlovic and John Isner. (Big servers, immobile as they tend to be, may induce even less running in their opponents, since so many of their shots are all-or-nothing. On the other hand, Murray outran another big server, Nick Kyrgios, at Wimbledon.)

We think of physical players like Murray and Djokovic as the ones covering the entire court, and by doing so, they simultaneously force their opponents to do the same–or more. In Novak’s ten Roland Garros and Wimbledon matches, he ran farther than his opponent only twice–in the Paris final against Murray, and in the second round of Wimbledon against Adrian Mannarino. In general, running fewer meters doesn’t appear to be a leading indicator of victory, but for certain players in the Murray-Djokovic mold, it may be.

In the same vein, combined distance run may turn out to be a worthwhile metric. For men who earn their money in long, physical rallies, total distance run could serve as a proxy for their success in forcing a certain kind of match.

It’s also possible that aggregate numbers will never be more than curiosities. In the average match, there was only a 125 meter difference between the distances covered by the two players. In percentage terms, that means one player outran the other by only 5.5%. And as we’ll see in a moment, a difference of that magnitude could happen simply because one player racked up more points on serve.

Point-level characteristics

In the majority of points, the returner does a lot more running than the server does. The server usually forces his opponent to start running first, and in today’s men’s game, the server rarely needs to scramble too much to hit his next shot.

On average, the returner must run just over 10% further. When the first serve is put in play, that difference jumps to 12%. On second-serve points, it drops to 7%.

By extension, we would expect that the player who runs further would, more often than not, lose the point. That’s not because running more is necessarily bad, but because of the inherent server’s advantage, which has the side effect of showing up in the distance run stats as well. That hypothesis turns out to be correct: The player who runs farther in a single point loses the point 56% of the time.

When we narrow our view to only those points with five shots or more, we see that running more is still associated with losing. In these longer rallies, the player who covered more distance loses 58% of the points.

Some of the “extra” running in shorter points can be attributed to returning serve–and thus, we can assume that players are losing points because of the disadvantage of returning, not necessarily because they ran so much. But even in very long rallies of 10 shots or more, the player who runs farther is more likely to lose the point. Even at the level of a single point, my suggestion above, that physical players succeed by forcing opponents to work even harder than they do, seems valid.

With barely 100 matches of data–and a somewhat biased sample, no less–there are only so many conclusions we can draw about distance run stats. Two Grand Slams worth of show court matches is just enough to give us a general context for understanding these numbers and to hint at some interesting findings about the best players. Let’s hope that IBM continues to collect these stats, and that the ATP and WTA follow suit.

The Grass is Slowing: Another Look at Surface Speed Convergence

Italian translation at settesei.it

A few years ago, I posted one of my most-read and most-debated articles, called The Mirage of Surface Speed Convergence.  Using the ATP’s data on ace rates and breaks of serve going back to 1991, it argued that surface speeds aren’t really converging, at least to the extent we can measure them with those two tools.

One of the most frequent complaints was that I was looking at the wrong data–surface speed should really be quantified by rally length, spin rate, or any number of other things. As is so often the case with tennis analytics, we have only so much choice in the matter. At the time, I was using all the data that existed.

Thanks to the Match Charting Project–with a particular tip of the cap to Edo Salvati–a lot more data is available now. We have shot-by-shot stats for 223 Grand Slam finals, including over three-fourths of Slam finals back to 1980. While we’ll never be able to measure anything like ITF Court Pace Rating for surfaces thirty years in the past, this shot-by-shot data allows us to get closer to the truth of the matter.

Sure enough, when we take a look at a simple (but until recently, unavailable) metric such as rally length, we find that the sport’s major surfaces are playing a lot more similarly than they used to. The first graph shows a five-year rolling average* for the rally length in the men’s finals of each Grand Slam from 1985 to 2015:

mens_finals_rallies

* since some matches are missing, the five-year rolling averages each represent the mean of anywhere from two to five Slam finals.

Over the last decade and a half, the hard-court and grass-court slams have crept steadily upward, with average rally lengths now similar to those at Roland Garros, traditionally the slowest of the four Grand Slam surfaces. The movement is most dramatic in the Wimbledon grass, which for many years saw an average rally length of a mere two shots.

For all the advantages of rally length and shot-by-shot data, there’s one massive limitation to this analysis: It doesn’t control for player. (My older analysis, with more limited data per match, but for many more matches, was able to control for player.) Pete Sampras contributed to 15 of our data points, but none on clay. Andres Gomez makes an appearance, but only at Roland Garros. Until we have shot-by-shot data on multiple surfaces for more of these players, there’s not much we can do to control for this severe case of selection bias.

So we’re left with something of a chicken-and-egg problem.  Back in the early 90’s, when Roland Garros finals averaged almost six shots per point and Wimbledon finals averaged barely two shots per point, how much of the difference was due to the surface itself, and how much to the fact that certain players reached the final? The surface itself certainly doesn’t account for everything–in 1988, Mats Wilander and Ivan Lendl averaged over seven shots per point at the US Open, and in 2002, David Nalbandian and Lleyton Hewitt topped 5.5 shots per point at Wimbledon.

Still, outliers and selection bias aside, the rally length convergence we see in the graph above reflects a real phenomenon, even if it is amplified by the bias. After all, players who prefer short points win more matches on grass because grass lends itself to short points, and in an earlier era, “short points” meant something more extreme than it does today.

The same graph for women’s Grand Slam finals shows some convergence, though not as much:

womens_finals_rallies

Part of the reason that the convergence is more muted is that there’s less selection bias. The all-surface dominance of a few players–Chris Evert, Martina Navratilova, and Steffi Graf–means that, if only by historical accident, there is less bias than in men’s finals.

We still need a lot more data before we can make confident statements about surface speeds in 20th-century tennis. (You can help us get there by charting some matches!) But as we gather more information, we’re able to better illustrate how the surfaces have become less unique over the years.

The Effects (and Maybe Even Momentum) of a Long Rally

Italian translation at settesei.it

In yesterday’s quarterfinal between Simona Halep and Victoria Azarenka, a highlight early in the third set was a 25-shot rally that Vika finished off with a forehand winner. It was the longest point of the match, and moved her within a point of holding serve to open the set.

As very long rallies often do, the point seemed like it might represent a momentum shift. Instead, Halep sent the game back to deuce after a 10-stroke rally on the next point. If there was any momentum conferred by these two points, it disappeared as quickly as it arose. It took eight more points before Azarenka finally sealed the hold of serve.

Does a long rally tell us anything at all? Does it have predictive value for the next point, or even the entire game, or is it just highlight-reel fodder that is forgotten as soon as the umpire announces the score?

To answer those questions, I delved into the shot-by-shot data of the Match Charting Project, which now contains point-by-point accounts of nearly 1,100 matches. I identified the longest 1% of points–17 shots or longer for women, 18 shots for men–and analyzed what happened afterwards, looking for both fatigue and momentum effects.

The next point

There’s one clear effect of a long rally: The next point will be shorter than average. The 10-shot rally contested by Vika and Simona yesterday was an outlier: Women average 4.45 shots on the point after a long rally, while the overall average (controlled for server and first or second serve) is 4.85. Men average 4.03 shots on the following point, compared to an average of 4.64.

For women, fatigue is also a factor for the server. Following a long rally, women land only 61.3% of first serves, compared to an average of 64.6%. Men don’t exhibit the same fatigue effect; the equivalent numbers are 62.3% and 62.2%.

There’s more evidence of an immediate fatigue factor for women, as well. The players who win those long rallies are slightly better than their opponents, winning 50.7% of points on average. Immediately after a long rally, however, players win only 49% of points.  It’s not obvious to me why this should be the case. Perhaps the player who won the long rally worked a bit harder than her opponent, maybe putting all of her remaining effort into a groundstroke winner, or finishing the point with a couple of athletic shots at the net.

In any case, there’s no equivalent effect for men.  After winning a long rally, players win 51.1% of their next points, compared to an expected 50.8%. That’s either a very small momentum effect or, more likely, a bit of statistical noise.

Both men and women double fault more often than usual after a long rally, though the effect is much greater for women. Immediately following these points, women double fault 4.7% of the time, compared to an average of 3.3%. Men double fault 4.5% of the time after a long rally, compared to an expected rate of 4.2%.

Longer-term momentum

Beyond a slightly effect on the characteristics of the next point, does a long rally influence the outcome of the game? The evidence suggests that it doesn’t.

For each long rally, I identified whether the winner of the rally went on to win the game, as Vika did yesterday. I also combined the score after the long rally with the average rate of points won on the appropriate player’s serve to calculate the odds that, from such a score, the player who won the rally would go on to win the game. To use yesterday’s example, when Azarenka held game point at AD-40, her chances of winning the game were 77.6%.

For both men and women, there is no significant effect. Women who won long rallies went on to win 66.2% of those games, while they would have been expected to win 65.7%. Men won 64.4% of those games, compared to an expected rate of 64.1%.

With a much larger dataset, these findings might indicate a very slight momentum effect. But limited to under 1,000 long-rally points for each gender, the differences represent only a few games that went the way of the player who won the long point.

For now, we’ll have to conclude that the aftereffects of a long rally have a very short lifespan: barely one point for women, perhaps not even that long for men. These points may well have a greater effect on fans than they do on the players themselves.