How Much Is a Challenge Worth?

Italian translation at settesei.it

When the Hawkeye line-calling system is available, tennis players are given the right to make three incorrect challenges per set. As with any situation involving scarcity, there’s a choice to make: Take the chance of getting a call overturned, or make sure to keep your options open for later?

We’ve learned over the last several years that human line-calling is pretty darn good, so players don’t turn to Hawkeye that often. At the Australian Open this year, men challenged fewer than nine calls per match–well under three per set or, put another way, less than 1.5 challenges per player per set. Even at that low rate of fewer than once per thirty points, players are usually wrong. Only about one in three calls are overturned.

So while challenges are technically scarce, they aren’t that scarce.  It’s a rare match in which a player challenges so often and is so frequently incorrect that he runs out. That said, it does happen, and while running out of challenges is low-probability, it’s very high risk. Getting a call overturned at a crucial moment could be the difference between winning and losing a tight match. Most of the time, challenges seem worthless, but in certain circumstances, they can be very valuable indeed.

Just how valuable? That’s what I hope to figure out. To do so, we’ll need to estimate the frequency with which players miss opportunities to overturn line calls because they’ve exhausted their challenges, and we’ll need to calculate the potential impact of failing to overturn those calls.

A few notes before we get any further.  The extra challenge awarded to each player at the beginning of a tiebreak would make the analysis much more daunting, so I’ve ignored both that extra challenge and points played in tiebreaks. I suspect it has little effect on the results. I’ve limited this analysis to the ATP, since men challenge more frequently and get calls overturned more often. And finally, this is a very complex, sprawling subject, so we often have to make simplifying assumptions or plug in educated guesses where data isn’t available.

Running out of challenges

The Australian Open data mentioned above is typical for ATP challenges. It is very similar to a subset of Match Charting Project data, suggesting that both challenge frequency and accuracy are about the same across the tour as they are in Melbourne.

Let’s assume that each player challenges a call roughly once every sixty points, or 1.7%. Given an approximate success rate of 30%, each player makes an incorrect challenge on about 1.2% of points and a correct challenge on 0.5% of points. Later on, I’ll introduce a different set of assumptions so we can see what different parameters do to the results.

Running out of challenges isn’t in itself a problem. We’re interested in scenarios when a player not only exhausts his challenges, but when he also misses an opportunity to overturn a call later in the set. These situations are much less common than all of those in which a player might want to contest a call, but we don’t care about the 70% of those challenges that would be wrong, as they wouldn’t have any effect on the outcome of the match.

For each possible set length, from 24-point golden sets up to 93-point marathons, I ran a Monte Carlo simulation, using the assumptions given above, to determine the probability that, in a set of that length, a player would miss a chance to overturn a later call. As noted above, I’ve excluded tiebreaks from this analysis, so I counted only the number of points up to 6-6. I also excluded all “advantage” fifth sets.

For example, the most common set length in the data set is 57 points, which occured 647 times. In 10,000 simulations, a player missed a chance to overturn a call 0.27% of the time. The longer the set, the more likely that challenge scarcity would become an issue. In 10,000 simulations of 85-point sets, players ran out of challenges more than three times as often. In 0.92% of the simulations, a player was unable to challenge a call that would have been overturned.

These simulations are simple, assuming that each point is identical. Of course, players are aware of the cap on challenges, so with only one challenge remaining, they may be less likely to contest a “probably correct” call, and they would be very unlikely to use a challenge to earn a few extra seconds of rest. Further, the fact that players sometimes use Hawkeye for a bit of a break suggests that what we might call “true” challenges–instances in which the player believes the original call was wrong–are a bit less frequent that the numbers we’re using. Ultimately, we can’t address these concerns without a more complex model and quite a bit of data we don’t have.

Back to the results. Taking every possible set length and the results of the simulation for each one, we find the average player is likely to run out of challenges and miss a chance to overturn a call roughly once every 320 sets, or 0.31% of the time. That’s not very often–for almost all players, it’s less than once per season.

The impact of (not) overturning a call

Just because such an outcome is infrequent doesn’t necessarily mean it isn’t important. If a low-probability event has a high enough impact when it does occur, it’s still worth planning for.

Toward the end of a set, when most of these missed chances would occur, points can be very important, like break point at 5-6. But other points are almost meaningless, like 40-0 in just about any game.

To estimate the impact of these missed opportunities, I ran another set of Monte Carlo simulations. (This gets a bit hairy–bear with me.) For each set length, for those cases when a player ran out of challenges, I found the average number of points at which he used his last challenge. Then, for each run of the simulation, I took a random set from the last few years of ATP data with the corresponding number of points, chose a random point between the average time that the challenges ran out and the end of the set, and measured the importance of that point.

To quantify the importance of the point, I calculated three probabilities from the perspective of the player who lost the point and, had he conserved his challenges, could have overturned it:

  1. his odds of winning the set before that point was played
  2. his odds of winning the set after that point was played (and not overturned)
  3. his odds of winning the set had the call been overturned and the point awarded to him.

(To generate these probabilities, I used my win probability code posted here with the assumption that each player wins 65% of his service points. The model treats points as independent–that is, the outcome of one point does not depend on the outcomes of previous points–which is not precisely true, but it’s close, and it makes things immensely more straightforward. Alert readers will also note that I’ve ignored the possibility of yet another call that could be overturned. However, the extremely low probability of that event convinced me to avoid the additional complexity required to model it.)

Given these numbers, we can calculate the possible effects of the challenge he couldn’t make. The difference between (2) and (3) is the effect if the call would’ve been overturned and awarded to him. The difference between (1) and (2) is the effect if the point would have been replayed. This is essentially the same concept as “leverage index” in baseball analytics.

Again, we’re missing some data–I have no idea what percentage of overturned calls result in each of those two outcomes. For today, we’ll say it’s half and half, so to boil down the effect of the missed challenge to a single number, we’ll average those two differences.

For example, let’s say we’re at five games all, and the returner wins the first point of the 11th game. The server’s odds of winning the set have decreased from 50% (at 5-all, love-all) to 43.0%. If the server got the call overturned and was awarded the point, his odds would increase to 53.8%. Thus, the win probability impact of overturning the call and taking the point is 10.8%, while the effect of forcing a replay is 7.0%. For the purposes of this simulation, we’re averaging these two numbers and using 8.9% as the win probability impact of this missed opportunity to challenge.

Back to the big picture. For each set length, I ran 1,000 simulations like what I’ve described above and averaged the results. In short sets under 40 points, the win probability impact of the missed challenge is less than five percentage points. The longer the set, the bigger the effect: Long sets are typically closer and the points tend to be higher-leverage. In 85-point sets, for instance, the average effect of the missed challenge is a whopping 20 percentage points–meaning that if a player more skillfully conserved his challenges in five such sets, he’d be able to reverse the outcome of one of them.

On average, the win probability effect of the missed challenge is 12.4 percentage points. In other words, better challenge management would win a player one more set for every eight times he didn’t lose such an opportunity by squandering his challenges.

The (small) big picture

Let’s put together the two findings. Based on our assumptions, players run out of challenges and forgo a chance to overturn a later call about once every 320 matches. We now know that the cost of such a mistake is, on average, a 12.4 percentage point win probability hit.

Thus, challenge management costs an average player one set out of every 2600. Given that many matches are played on clay or on courts without Hawkeye, that’s maybe once in a career. As long as the assumptions I’ve used are in the right ballpark, the effect isn’t even worth talking about. The mental cost of a player thinking more carefully before challenging might be greater than this exceedingly unlikely benefit.

What if some of the assumptions are wrong? Anecdotally, it seems like challenges cluster in certain matches, because of poor officiating, bad lighting, extreme spin, precise hitting, or some combination of these. It seems possible that certain scenarios would arise in which a player would want to challenge much more frequently, and even though he might gain some accuracy, he would still increase the risk.

I ran the same algorithms for what seems to me to be an extreme case, almost doubling the frequency with which each player challenges, to 3.0%, and somewhat increasing the accuracy rate, to 40%.

With these parameters, a player would run out of challenges and miss an opportunity to overturn a call about six times more often–once every 54 sets, or 1.8% of the time. The impact of each of these missed opportunities doesn’t change, so the overall result also increases by a factor of six. In these extreme case, poor challenge management would cost a player the set 0.28% of the time, or once every 356 sets. That’s a less outrageous number, representing perhaps one set every second year, but it also applies to unusual sets of circumstances which are very unlikely to follow a player to every match.

It seems clear that three challenges is enough. Even in long sets, players usually don’t run out, and when they do, it’s rare that they miss an opportunity that a fourth challenge would have afforded them. The effect of a missed chance can be enormous, but they are so infrequent that players would see little or no benefit from tactically conserving challenges.

The Difficulty (and Importance) of Finding the Backhand

Italian translation at settesei.it

One disadvantage of some one-handed backhands is that they tend to sit up a little more when they’re hit crosscourt. That gives an opponent more time to prepare and, often, enough time to run around a crosscourt shot and hit a forehand, which opens up more tactical possibilities.

With the 700 men’s matches in the Match Charting Project database (please contribute!), we can start to quantify this disadvantage–if indeed it has a negative effect on one-handers. Once we’ve determined whether one-handers can find their opponents’ backhands, we can try to answer the more important question of how much it matters.

The scenario

Let’s take all baseline rallies between right-handers. Your opponent hits a shot to your backhand side, and you have three choices: drive (flat or topspin) backhand, slice backhand, or run around to hit a forehand. You’ll occasionally go for a winner down the line and you’ll sometimes be forced to hit a weak reply down the middle, but usually, your goal is to return the shot crosscourt, ideally finding your opponent’s backhand.

Considering all righty-righty matchups including at least one player among the last week’s ATP top 72 (I wanted to include Nicolas Almagro), here are the frequency and results of each of those choices:

SHOT    FREQ  FH REP  BH REP    UFE  WINNER  PT WON  
ALL             9.9%   68.1%  10.8%    5.8%   43.1%  
SLICE  11.9%   34.1%   49.5%   7.1%    0.6%   40.2%  
FH     44.9%    2.8%   69.0%  13.0%    9.8%   42.1%  
BH     43.3%   10.7%   72.2%   9.5%    3.1%   45.0%  
                                                     
1HBH   42.6%   12.0%   69.5%   9.3%    3.8%   44.2%  
2HBH   43.5%   10.0%   73.4%   9.6%    2.8%   45.4%

“FH REP” and “BH REP” refer to a forehand or backhand reply, and we can see just how much shot selection matters in keeping the ball away from your opponent’s forehand. A slice does a very poor job, while an inside-out forehand almost guarantees a backhand reply, though it comes with an increased risk of error.

The differences between one- and two-handed backhands aren’t as stark. One-handers don’t find the backhand quite as frequently, though they hit a few more winners. They hit drive backhands a bit less often, but that doesn’t necessarily mean they are hitting forehands instead. On average, two-handers hit a few more forehands from the backhand corner, while one-handers are forced to hit more slices.

One hand, many types

Not all one-handed backhands are created equal, and these numbers bear that out. Stanislas Wawrinka‘s backhand is as effective as the best two-handers, while Roger Federer‘s is typically the jumping-off point for discussions of why the one-hander is dying.

Here are the 28 players for whom we have at least 500 instances (excluding service returns) when the player responded to a shot hit to his backhand corner. For each, I’ve shown how often he chose a drive backhand or forehand, and the frequency with which he found the backhand–excluding his own errors and winners.

Player                 BH  BH FRQ  FIND BH%  FH FRQ  FIND BH%  
Alexandr Dolgopolov     2   45.7%     94.2%   43.3%     98.7%  
Kei Nishikori           2   51.1%     94.0%   38.9%     98.1%  
Andy Murray             2   41.0%     92.4%   46.5%     98.6%  
Stanislas Wawrinka      1   48.6%     92.1%   37.5%     98.0%  
Bernard Tomic           2   33.8%     91.7%   43.8%     97.9%  
Novak Djokovic          2   47.2%     91.7%   41.4%     98.5%  
Kevin Anderson          2   41.0%     91.5%   45.8%     96.6%  
Borna Coric             2   46.5%     90.7%   44.2%     96.9%  
Pablo Cuevas            1   41.9%     90.6%   54.5%     96.5%  
Marin Cilic             2   45.4%     89.7%   43.3%     97.2%  
                                                               
Player                 BH  BH FRQ  FIND BH%  FH FRQ  FIND BH%  
Tomas Berdych           2   41.6%     89.3%   44.2%     97.5%  
Pablo Carreno Busta     2   55.4%     87.8%   41.1%     93.5%  
Fabio Fognini           2   46.0%     87.4%   47.0%     96.1%  
Richard Gasquet         1   57.2%     87.3%   32.1%     96.8%  
Andreas Seppi           2   40.3%     87.2%   50.0%     93.9%  
Nicolas Almagro         1   53.6%     86.5%   39.3%     98.0%  
Dominic Thiem           1   38.5%     86.2%   50.0%     96.5%  
Gael Monfils            2   48.0%     85.3%   46.3%     85.3%  
David Ferrer            2   48.2%     84.9%   40.4%     97.1%  
Roger Federer           1   42.7%     84.8%   43.6%     94.5%  
                                                               
Player                 BH  BH FRQ  FIND BH%  FH FRQ  FIND BH%  
Gilles Simon            2   46.9%     84.6%   46.5%     94.6%  
David Goffin            2   45.4%     84.6%   45.7%     94.9%  
Roberto Bautista Agut   2   39.6%     83.3%   46.7%     98.4%  
Jo Wilfried Tsonga      2   43.5%     82.0%   44.5%     96.3%  
Grigor Dimitrov         1   41.4%     78.6%   39.4%     92.8%  
Milos Raonic            2   31.5%     63.5%   56.5%     94.3%  
Jack Sock               2   27.0%     62.5%   62.9%     96.3%  
Tommy Robredo           1   26.6%     56.1%   62.3%     88.4%

One-handers Wawrinka, Pablo Cuevas, and Richard Gasquet (barely) are among the top half of these players, in terms of finding the backhand with their own backhand. Federer and his would-be clone Grigor Dimitrov are at the other end of the spectrum.

Taking all 60 righties I included in this analysis (not just those shown above), there is a mild negative correlation (r^2 = -0.16) between a player’s likelihood of finding the opponent’s backhand with his own and the rate at which he chooses to hit a forehand from that corner. In other words, the worse he is at finding the backhand, the more inside-out forehands he hits. Tommy Robredo and Jack Sock are the one- and two-handed poster boys for this, struggling more than any other players to find the backhand, and compensating by hitting as many forehands as possible.

However, Federer–and, to an even greater extent, Dimitrov–don’t fit this mold. The average one-hander runs around balls in their backhand corner 44.6% of the time, while Fed is one percentage point under that and Dimitrov is below 40%. Federer is perceived to be particularly aggressive with his inside-out (and inside-in) forehands, but that may be because he chooses his moments wisely.

Ultimate outcomes

Let’s look at this from one more angle. In the end, what matters is whether you win the point, no matter how you get there. For each of the 28 players listed above, I calculated the rate at which they won points for each shot selection. For instance, when Novak Djokovic hits a drive backhand from his backhand corner, he wins the point 45.4% of the time, compared to 42.3% when he hits a slice and 42.4% when he hits a forehand.

Against his own average, Djokovic is about 3.6% better when he chooses (or to think of it another way, is able to choose) a drive backhand. For all of these players, here’s how each of the three shot choices compare to their average outcome:

Player                 BH   BH W   SL W   FH W  
Dominic Thiem           1  1.209  0.633  0.924  
David Goffin            2  1.111  0.656  0.956  
Grigor Dimitrov         1  1.104  0.730  1.022  
Gilles Simon            2  1.097  0.922  0.913  
Tomas Berdych           2  1.085  0.884  0.957  
Pablo Carreno Busta     2  1.081  0.982  0.892  
Kei Nishikori           2  1.070  0.777  0.965  
Roberto Bautista Agut   2  1.055  0.747  1.027  
Stanislas Wawrinka      1  1.050  0.995  0.936  
Borna Coric             2  1.049  1.033  0.941  
                                                
Player                 BH   BH W   SL W   FH W  
Bernard Tomic           2  1.049  1.037  0.943  
Jack Sock               2  1.049  0.811  1.010  
Gael Monfils            2  1.048  1.100  0.938  
Fabio Fognini           2  1.048  0.775  0.987  
Milos Raonic            2  1.048  0.996  0.974  
Nicolas Almagro         1  1.046  0.848  0.964  
Kevin Anderson          2  1.038  1.056  0.950  
Novak Djokovic          2  1.036  0.966  0.969  
Andy Murray             2  1.031  1.039  0.962  
Roger Federer           1  1.023  1.005  0.976  
                                                
Player                 BH   BH W   SL W   FH W  
Richard Gasquet         1  1.020  0.795  1.033  
Andreas Seppi           2  1.019  0.883  1.008  
David Ferrer            2  1.018  0.853  1.020  
Alexandr Dolgopolov     2  1.010  1.010  0.987  
Marin Cilic             2  1.006  1.009  0.991  
Pablo Cuevas            1  0.987  0.425  1.048  
Jo Wilfried Tsonga      2  0.956  0.805  1.095  
Tommy Robredo           1  0.845  0.930  1.079

In this view, Dimitrov–along with his fellow one-handed flame carrier Dominic Thiem–looks a lot better. His crosscourt backhand doesn’t find many backhands, but it is by far his most effective shot from his own backhand corner. We would expect him to win more points with a drive backhand than with a slice (since he probably opts for slices in more defensive positions), but it’s surprising to me that his backhand is so much better than the inside-out forehand.

While Dimitrov and Thiem are more extreme than most, almost all of these players have better results with crosscourt drive backhands than with inside-out (or inside-in forehands). Only five–including Robredo but, shockingly, not including Sock–win more points after hitting forehands from the backhand corner.

It’s clear that one-handers do, in fact, have a slightly more difficult time forcing their opponents to hit backhands. It’s much less clear how much it matters. Even Federer, with his famously dodgy backhand and even more famously dominant inside-out forehand, is slightly better off hitting a backhand from his backhand corner. We’ll never know what would happen if Fed had Djokovic’s backhand instead, but even though Federer’s one-hander isn’t finding as many backhands as Novak’s two-hander does, it’s getting the job done at a surprisingly high rate.

Toward a Better Understanding of Return Effectiveness

Italian translation at settesei.it

The deeper the return, the better, right? That, at least, is the basis for many of the flashy graphics we see these days on tennis broadcasts, indicating the location of every return, often separated into “shallow,” “medium,” and “deep” zones.

In general, yes, deep returns are better than shallow ones. But return winners aren’t overwhelmingly deep, since returners can achieve sharper angles if they aim closer to the service line. There are plenty of other complicating factors as well: returns to the sides of the court are more effective than those down the middle, second-serve returns tend to be better than first-serve returns, and topspin returns result in more return points won than chip or slice returns.

While most of this is common sense, quantifying it is an arduous and mind-bending task. When we consider all these variables–first or second serve, deuce or ad court, serve direction, whether the returner is a righty or lefty, forehand or backhand return, topspin or slice, return direction, and return depth–we end up with more than 8,500 permutations. Many are useless (righties don’t hit a lot of forehand chip returns against deuce court serves down the T), but thousands reflect some common-enough scenario.

To get us started, let’s set aside all of the variables but one. When we analyze 600+ ATP matches in the Match Charting Project data, we have roughly 61,000 in-play returns coded in one of nine zones, including at least 2,000 in each.  Here is a look at the impact of return location, showing the server’s winning percentage when a return comes back in play to one of the nine zones:rzones1show

(“Shallow” is defined as anywhere inside the service boxes, while “Medium” and “Deep” each represent half of the area behind the service boxes. The left, center, and right zones are intended to indicate roughly where the return would cross the baseline, so for sharply angled shots, a return might bounce shallow near the middle of the court but be classified as a return to the forehand or backhand side.)

As we would expect, deeper returns work in favor of the returner, as do returns away from the center of the court. A bit surprisingly, returns to the server’s forehand side (if he’s a right-hander) are markedly more effective than those to the backhand. This is probably because right-handed returners are most dangerous when hitting crosscourt forehands, although left-handed returners are also more effective (if not as dramatically) when returning to that side of the court.

Let’s narrow things down just a little and see how the impact of return location differs on first and second serves. Here are the server’s chances of winning the point if a first-serve return comes back in each of the nine zones:

rzones2showF

And the same for second-serve returns:

rzones3showF

It’s worth emphasizing just how much impact a deep return can have. So many points are won with unreturnable serves–even seconds–that simply getting the ball back in play comes close to making the point a 50/50 proposition. A deep second-serve return, especially to a corner, puts the returner in a very favorable position. Consistently hitting returns like that is a big reason why Novak Djokovic essentially turns his opponents’ second serves against them.

The final map makes it clear how valuable it is to move the server away from the middle of the court. Think of it as a tactical first strike, forcing the server to play defensively instead of dictating play with his second shot. Among second-serve returns put in play, any ball placed away from the middle of the court–regardless of depth–gives the returner a better chance of winning the point than does a deep return down the middle.

For today, I’m going to stop here. This is just the tip of the iceberg, as there are so many variables that play some part in the effectiveness of various service returns. Ultimately, understanding the potency of each return location will give us additional insight into what players can achieve with different kinds of serve, which players are deadliest with certain types of returns, and how best to handle different returns with the server’s crucial second shot.

Measuring the Effectiveness of Backhand Returns

Italian translation at settesei.it

One-handed backhands can be beautiful, but they aren’t always the best tools for the return of serve. Some of the players with the best one-handers in the game must often resort to slicing backhand returns–Stanislas Wawrinka, for example, slices 68% of backhand first serve returns and 40% of backhand second serve returns, while Andy Murray uses the slice 41% and 3%, respectively.

Using the 650 men’s matches in the Match Charting Database, I looked at various aspects of backhand serve returns to try to get a better sense of the trade-offs involved in using a one-handed backhand. Because the matches in the MCP aren’t completely representative of the ATP tour, the numbers are approximate. But given the size and breadth of the sample, I believe the results are broadly indicative of men’s tennis as a whole.

At the most general level, players with double-handed backhands are slightly better returners, putting roughly the same number of returns in play (about 56%) and winning a bit more often–46.9% to 45.7%–when they do so. The gap is a bit wider when we look at backhand returns put in play: 46.5% of points won to 44.7%. While the favorable two-hander numbers are influenced by the historically great returning of Novak Djokovic, two-handers still have an edge if we reduce his weight in the sample or remove him entirely.

Unsurprisingly, players realize that two-handed backhands are more effective returns, and they serve accordingly. The MCP divides serves into three zones–down the tee, body, and wide–and I’ve re-classified those as “to the forehand,” “to the body,” and “to the backhand” depending on the returner’s dominant hand and whether the point is in the deuce or ad court. While we can’t identify exactly where servers aimed those to-the-body serves, we can determine some of their intent from serves aimed at the corners.

Against returners with two-handed backhands, servers went for the backhand corner on 44.2% of first serves and 34.8% of second serves. Against one-handers, they aimed for the same spot on 47.3% of first serves and 40.9% of second serves. Looking at the same question from another angle, backhands make up 61.7% of the returns in play hit by one-handers compared to 59.0% for double-handers. It seems likely that one-handers more aggressively run around backhands to hit forehand returns, so this last comparison probably understates the degree to which servers aim for single-handed backhands.

When servers do manage to find the backhand side of a single-hander, they’re often rewarded with a slice return. On average, one-handers (excluding Roger Federer, who is overrepresented in this dataset) use the slice on 53.9% of their backhand first-serve returns and 32.3% of their backhand second-serve returns. Two-handers use the slice 20.5% of the time against firsts and only 2.5% of the time against seconds.

For both types of players, against first and second serves, slice returns are less effective than flat or topspin backhand returns. This isn’t surprising, either–defensive shots are often chosen in defensive situations, so the difference in effectiveness is at least partly due to the difference in the quality of the serves themselves. Still, since one-handers choose to go to the slice so much more frequently, it’s valuable to know how the types of returns compare:

Return Type   BH in play W% SL in play W% 
1HBH vs Firsts        43.3%         37.6% 
1HBH vs Seconds       46.0%         44.1% 
                        
2HBH vs Firsts        46.8%         36.2% 
2HBH vs Seconds       48.6%         41.9%

(Again, I’ve excluded Fed from the 1HBH averages.)

In three of the four rows, there’s a difference of several percentage points between the effectiveness of slice returns and flat or topspin returns, as measured by the ultimate outcome of the point. The one exception–second-serve returns by one-handers–reminds us that the slice can be an offensive weapon, even if it’s rarely used as one in the modern game. Some players–including Federer, Feliciano Lopez, Grigor Dimitrov, and Bernard Tomic–are more effective with slice returns than flat or topspin returns against either first or second serves.

However, these players are the exceptions, and in the theoretical world where we can set all else equal, a slice return is the inferior choice. All players have to hit slice returns sometimes, and many of those seem to be forced by powerful serving, but the fact remains: one-handers hit slices much more than two-handers do, and despite the occasional offensive opportunity, slice returns are more likely to hand the point to the server.

These differences are real, but they are still modest. A good returner with a one-handed backhand is considerably better than a bad returner with a two-hander, and it’s even possible to have a decent return game while hitting mostly slices. All that said, in the aggregate, a one-handed backhand is a bit of a liability on the return. It will take further research to determine whether other benefits–such as the sizzling down-the-line winners we’ve come to expect from the likes of Wawrinka and Richard Gasquet–outweigh the costs.

The Dreaded Deficit at the Tiebreak Change of Ends

Italian translation at settesei.it

Some of tennis’s conventional wisdom manages to be both blindingly self-evident and obviously wrong. Give pundits a basic fact (winning more points is good), add a dash of perceived momentum, and the results can be toxic.

A great example is the tiebreak change of ends. The typical scenario goes something like this: Serving at 2-3 in a tiebreak, a player loses a point on serve, going down a minibreak to 2-4. As the players change sides, a commentator says, “You really don’t want to go into this change of ends without at least keeping the score even.”

While the full rationale is rarely spelled out, the implication is that losing that one point–going from 2-3 to 2-4–is somehow worse than usual because the point precedes the changeover. Like the belief that the seventh game of the set is particularly important, this has passed, untested, into the canon.

Let’s start with the “blindingly self-evident” part. Yes, it’s better to head into the change of ends at 3-3 than it is at 2-4. In a tiebreak, every point is crucial. Based on a theoretical model and using sample players who each win 65% of service points, here are the odds of winning a tiebreak from various scores at the changeover:

Score  p(Win)  
1*-5     5.4%  
2*-4    21.5%  
3*-3    50.0%  
4*-2    78.5%  
5*-1    94.6%

It’s easy to sum that up: You really want to win that sixth point. (Or, at least, several of the points before the sixth.) On the other hand, compare that to the scenarios after eight points:

Score  p(Win)  
2*-6     2.6%  
3*-5    17.6%  
4*-4    50.0%  
5*-3    82.4%  
6*-2    97.4%

At the risk of belaboring the obvious, when the score is close, points become more important later in the tiebreak. The outcome at 4-4 matters more than at 3-3, which matters more than at 2-2, and so on. If players changed ends after eight points, we’d probably bestow some magical power on that score instead.

Real-life outcomes

So far, I’ve only discussed what the model tells us about win probabilities at various tiebreak scores. If the pundits are right, we should see a gap between the theoretical likelihood of winning a tiebreak from 2-4 and the number of times that players really do win tiebreaks from those scores. The model says that players should win 21.5% of tiebreaks from 2*-4; if the conventional wisdom is correct, we would find that players win even fewer tiebreaks when trying to come back from that deficit.

By analyzing the 20,000-plus tiebreaks in this dataset, we find that the opposite is true. Falling to 2-4 is hugely worse than reaching the change of ends at 3-3, but it isn’t worse than the model predicts–it’s a bit better.

To quantify the effect, I determined the likelihood that the player serving immediately after the changeover would win the tiebreak, based on each player’s service points won throughout the match and the model I’ve referred to above. By aggregating all of those predictions, together with the observed result of each tiebreak, we can see how real life compares to the model.

In this set of tiebreaks, a player serving at 2-4 would be expected to win 20.9% of the time. In fact, these players go to win the tiebreak 22.0% of the time–a small but meaningful difference. We see an even bigger gap for players returning at 2-4. The model predicts that they would win 19.9% of the time, but they end up winning 22.1% of these tiebreaks.

In other words, after six points, the player with more points is heavily favored, but if there’s any momentum–that is, if either player has more of an advantage than the mere score would suggest–the edge belongs the player trailing in the tiebreak.

Sure enough, we see the same effect after eight points. Serving at 3-5, players in this dataset have a 16.3% (theoretical) probability of winning the tiebreak, but they win 19.0% of the time. Returning at 3-5, their paper chance is 17.2%, and they win 19.5%.

There’s nothing special about the first change of ends, and there probably isn’t any other point in a tiebreak that is more crucial than the model suggests. Instead, we’ve discovered that underdogs have a slightly better chance of coming back than their paper probabilities indicate. I suspect we’re seeing the effect of front-runners getting tight and underdogs swinging more freely–an aspect of tennis’s conventional wisdom that has much more to recommend itself than the idea of a magic score after the first six points of a tiebreak.

Does Serving First in a Tiebreak Give You an Edge?

Italian translation at settesei.it

Tiebreaks are so balanced, with frequently alternating servers and sides of the court, that it seems they must be fair. As far as I know, there is no commonly-cited conventional wisdom to the effect that the first server (or second server) in a tiebreak has any kind of advantage.

Let’s check. In a dataset of over 5,200 tiebreaks at ATP tour events, the first server won 50.8% of the time. Calculating each player’s service points won for the entire match and using those numbers to determine the likelihood that the first server would win a tiebreak, we get an estimate that those first servers should have won only 48.8% of them.

Two percentage points is a small gap, but here, it’s a meaningful one. It’s persistent across each of the three years most heavily represented in the dataset (2013-15), and it holds regardless of the set. While there might be some bias in the results of first-set tiebreaks, since better servers often choose to serve first and lesser servers choose to receive, the effect in each set favors the first server, and the impact of serving first is greater in the third set than in the first.

However, this effect–at least in its magnitude–is limited to ATP results. A survey of 2,500 recent WTA tiebreaks shows that first servers have won 49.7% of tiebreaks, compared to 49.4% that they should have won. Women’s ITF matches and men’s futures matches return similar results. Running the same algorithm on 6,200 men’s Challenger-level tiebreaks confuses the issue even further: Here, first servers won 48.1% of tiebreaks, while they should have won 48.7%.

A byproduct of this research is the discovery that, for both genders and at multiple levels of the game, the first server in a tiebreak is, on average, the weaker player. At first glance, that doesn’t make a lot of sense: We think of tiebreaks as deciding sets when the two players are equal. And since the effect is present for the second and third sets as well as the first, this finding isn’t biased by players choosing who will serve first.

As it turns out, this result can be at least partially explained by another byproduct of my recent research. In my attempt to determine whether it’s particularly difficult to hold when serving for the set, I calculated the odds of holding serve at every score throughout a set, compared to how frequently players should have held. At most holds–including those with the set on the line–there aren’t any major discrepancies between actual hold rates and expected hold rates.

But I did find some small effects that are relevant here. In general, it is a bit harder to hold serve as the second server, at scores such as 3-4, 4-5, and 5-6, than as the first, at scores like 3-3, 4-4, and 5-5. For instance, in the ATP data, players hold serve at 4-4 exactly as often as we would expect them to, based on their rate of service points won throughout the match. But at 4-5, their performance drops to 1.4% below expectations. In the WTA data, while players underperform at 5-5 by 1.4%, they are far worse at 5-6, winning 5.2% less often than they should.

In other words, if two players of equal abilities stay on serve for the first several games of a set, the second server is a little more likely to crack, getting broken and losing the set. Thus, if neither player is broken (or the number of breaks is equal), the second server is likely to be just a little bit better.

That explains, at least in part, why second servers are favored on paper going into tiebreaks. What it doesn’t account for is the discovery that on the ATP tour, first servers overcome that paper advantage and win more than half of tiebreaks. For that, I don’t have a good answer.

Digging Out of the Holes of 0-40 and 15-40

In the men’s professional game, serving at 0-40 isn’t a death sentence, but it isn’t a good place to be. An average player wins about 65% of service points, and at that rate, his chance of coming back from 0-40 is just a little better than one in five.

Some players are better than others at executing this sort of comeback. Tommy Robredo, for instance, has come back from 0-40 nearly 60% more often than we’d expect, while Sam Querrey digs out of the 0-40 hole one-third less often than we would predict.

Measuring a player’s success rate in these scenarios isn’t simply a matter of counting up 0-40 games. That’s what we saw on the ATP official site last week, and it’s woefully inadequate. That article marvels at Ivo Karlovic‘s “clutch” accomplishments from 0-40 and 15-40, when we could easily have guessed that Ivo would lead just about any serving category. Big serving isn’t clutch if it’s what you always do.

Statistics are only valuable in context, and that is particularly true in tennis. Simply counting 0-40 games and reporting the results hides a huge amount of potential insight. Whether a player wins or loses (a game, a set, a match, or a stretch of matches) is only the first question. To deliver any kind of meaningful analysis, we need to adjust those results for the competition and consider what we already know about the players we’re studying.

Rather than tear apart that article, though, let’s do the analysis correctly.

The number of times a player comes back from 0-40 or 15-40 isn’t what’s important. As we’ve seen, big servers will dominate those categories. That doesn’t tell us who is particularly effective (or, dare we say, “clutch”) in such a situation, it only identifies the best servers. What matters is how often players come back compared to how often we would expect them to, taking into consideration their serving ability.

Karlovic is an instructive example. Over the last few years–the time span available in this dataset of point-by-point match records–Ivo has gone down 0-40 56 times, holding 17 of those games, a rate of 30.4%. That’s third-best on tour, behind John Isner and Samuel Groth. But compared to how well we would expect Karlovic to serve, he’s only 7% better than neutral, right in the middle of the ATP pack.

Before diving into the results, a few more notes on methodology. For each 0-40 or 15-40 game, I calculated the server’s rate of service points won in that match. Since we would expect 0-40 games to occur more often in matches with good returners, in-match rates seem more accurate than season-long aggregates. Given the in-match rate of serve points won, I then determined the odds that the server would come back from the 0-40 or 15-40 score. For each game, then, we have a result (came back or didn’t come back) and an estimate of the comeback’s likelihood. Combining both numbers for all of a player’s service games tells us how effective he was at these scores.

For 30 of the players best represented in the dataset, here are their results at 0-40, showing the number of games, the number of successful comebacks, the rate of successful comebacks, and the degree to which the player exceeded expectations from 0-40:

Player                  0-40  0-40 W  0-40 W%  W/Exp  
Tommy Robredo            110      30    27.3%   1.59  
Denis Istomin            114      26    22.8%   1.36  
John Isner                87      31    35.6%   1.34  
Guillermo Garcia-Lopez   161      29    18.0%   1.32  
Kevin Anderson           130      38    29.2%   1.28  
Bernard Tomic            110      24    21.8%   1.25  
Fernando Verdasco        141      30    21.3%   1.17  
Rafael Nadal             140      32    22.9%   1.15  
Kei Nishikori            122      23    18.9%   1.15  
Marin Cilic              125      26    20.8%   1.14  
                                                      
Player                  0-40  0-40 W  0-40 W%  W/Exp  
Jo-Wilfried Tsonga       124      29    23.4%   1.14  
Novak Djokovic           124      34    27.4%   1.12  
Andreas Seppi            145      24    16.6%   1.09  
Grigor Dimitrov          115      22    19.1%   1.08  
Philipp Kohlschreiber    146      28    19.2%   1.08  
Roger Federer            107      26    24.3%   1.07  
Ivo Karlovic              56      17    30.4%   1.07  
Santiago Giraldo         113      18    15.9%   1.06  
Alexandr Dolgopolov      141      25    17.7%   1.03  
Milos Raonic              82      23    28.0%   1.01  
                                                      
Player                  0-40  0-40 W  0-40 W%  W/Exp  
Tomas Berdych            149      30    20.1%   1.01  
Jeremy Chardy            122      21    17.2%   0.98  
Feliciano Lopez          136      26    19.1%   0.97  
Fabio Fognini            211      24    11.4%   0.97  
Mikhail Youzhny          155      18    11.6%   0.92  
David Ferrer             203      32    15.8%   0.89  
Richard Gasquet          152      25    16.4%   0.87  
Andy Murray              164      24    14.6%   0.80  
Gilles Simon             158      16    10.1%   0.72  
Sam Querrey               84      12    14.3%   0.68

As I mentioned above, Robredo has been incredibly effective in these situations, coming back from 0-40 30 times instead of the 19 times we would have expected. Some big servers, such as Isner and Kevin Anderson, are even better than their well-known weapons would leads us to expect, while others, such as Karlovic and Milos Raonic, aren’t noticeably more effective at 0-40 than they are in general.

Many of these extremes don’t hold up when we turn to the results from 15-40. Quite a few more games reach 15-40 than 0-40, so the more limited variation at 15-40 suggests that many of the extreme results from 0-40 can be ascribed to an inadequate sample. For instance, Robredo–our 0-40 hero–falls to neutral at 15-40. Here is the complete list:

Player                  15-40  15-40 W  15-40 W%  W/Exp  
John Isner                238      122     51.3%   1.33  
Milos Raonic              215       98     45.6%   1.18  
Feliciano Lopez           304      108     35.5%   1.17  
Jo-Wilfried Tsonga        301      119     39.5%   1.17  
Denis Istomin             304      101     33.2%   1.17  
Rafael Nadal              320      118     36.9%   1.16  
Ivo Karlovic              148       68     45.9%   1.15  
Kevin Anderson            338      132     39.1%   1.15  
Guillermo Garcia-Lopez    405      106     26.2%   1.14  
Andreas Seppi             396      113     28.5%   1.12  
                                                         
Player                  15-40  15-40 W  15-40 W%  W/Exp  
Bernard Tomic             273       86     31.5%   1.12  
Kei Nishikori             298       96     32.2%   1.10  
Novak Djokovic            348      132     37.9%   1.07  
Richard Gasquet           325      106     32.6%   1.07  
Roger Federer             281      109     38.8%   1.07  
Fernando Verdasco         306       94     30.7%   1.06  
Philipp Kohlschreiber     352      110     31.3%   1.06  
Andy Murray               431      135     31.3%   1.06  
Santiago Giraldo          331       86     26.0%   1.05  
Tomas Berdych             398      131     32.9%   1.05  
                                                         
Player                  15-40  15-40 W  15-40 W%  W/Exp  
Marin Cilic               357      109     30.5%   1.05  
Sam Querrey               244       78     32.0%   1.04  
Jeremy Chardy             300       91     30.3%   1.04  
Fabio Fognini             422       98     23.2%   1.03  
Tommy Robredo             285       78     27.4%   0.99  
Grigor Dimitrov           307       89     29.0%   0.99  
David Ferrer              498      138     27.7%   0.98  
Alexandr Dolgopolov       299       77     25.8%   0.95  
Mikhail Youzhny           339       77     22.7%   0.94  
Gilles Simon              426       93     21.8%   0.91

The big servers are better represented at the top of this ranking. Even though Isner is expected to come back from 15-40 nearly 40% of the time–better than almost anyone on tour–he exceeds that expectation by one-third, far more than anyone else considered here.

Finally, let’s look at comebacks from 0-30:

Player                  0-30  0-30 W  0-30 W%  W/Exp  
John Isner               338     229    67.8%   1.19  
Bernard Tomic            299     146    48.8%   1.15  
Grigor Dimitrov          342     166    48.5%   1.11  
Novak Djokovic           409     235    57.5%   1.10  
Santiago Giraldo         344     142    41.3%   1.10  
Fernando Verdasco        373     175    46.9%   1.10  
Rafael Nadal             376     194    51.6%   1.09  
Tomas Berdych            492     262    53.3%   1.09  
Tommy Robredo            296     132    44.6%   1.08  
Roger Federer            344     193    56.1%   1.08  
                                                      
Player                  0-30  0-30 W  0-30 W%  W/Exp  
Feliciano Lopez          326     161    49.4%   1.07  
Alexandr Dolgopolov      347     154    44.4%   1.07  
Marin Cilic              378     179    47.4%   1.06  
Jo-Wilfried Tsonga       357     185    51.8%   1.06  
Guillermo Garcia-Lopez   380     146    38.4%   1.06  
Ivo Karlovic             186     118    63.4%   1.04  
Philipp Kohlschreiber    395     185    46.8%   1.03  
Denis Istomin            314     135    43.0%   1.03  
Kei Nishikori            341     145    42.5%   1.03  
David Ferrer             529     227    42.9%   1.02  
                                                      
Player                  0-30  0-30 W  0-30 W%  W/Exp  
Kevin Anderson           361     181    50.1%   1.02  
Mikhail Youzhny          390     142    36.4%   1.00  
Andy Murray              419     185    44.2%   1.00  
Andreas Seppi            418     164    39.2%   0.99  
Jeremy Chardy            316     132    41.8%   0.99  
Milos Raonic             246     139    56.5%   0.99  
Fabio Fognini            478     153    32.0%   0.99  
Sam Querrey              292     131    44.9%   0.97  
Gilles Simon             442     155    35.1%   0.96  
Richard Gasquet          370     159    43.0%   0.95

Isner still stands at the top of the leaderboard, while Bernard Tomic and Grigor Dimitrov give us a mild surprise by filling out the top three. Again, as the sample size increases, the variation decreases even further, illustrating that, over the long term, players tend to serve about as well at one score as they do at any other.

Forecasting the Effects of Performance Byes in Beijing

To the uninitiated, the WTA draw in Beijing this week looks a little strange. The 64-player draw includes four byes, which were given to the four semifinalists from last week’s event in Wuhan. So instead of empty places in the bracket next to the top four seeds, those free passes go to the 5th, 10th, and 15th seeds, along with one unseeded player, Venus Williams.

“Performance byes”–those given to players based on their results the previous week, rather than their seed–have occasionally featured in WTA draws over the last few years. If you’re interested in their recent history, Victoria Chiesa wrote an excellent overview.

I’m interested in measuring the benefit these byes confer on the recipients–and the negative effect they have on the players who would have received those byes had they been awarded in the usual way. I’ve written about the effects of byes before, but I haven’t contrasted different approaches to awarding them.

This week, the beneficiaries are Garbine Muguruza, Angelique Kerber, Roberta Vinci, and Venus Williams. The top four seeds–the women who were atypically required to play first-round matches, were Simona Halep, Petra Kvitova, Flavia Pennetta, and Agnieszka Radwanska.

To quantify the impact of the various possible formats of a 64-player draw, I used a variety of tools: Elo to rate players and predict match outcomes, Monte Carlo tournament simulations to consider many different permutations of each draw, and a modified version of my code to “reseed” brackets. While this is complicated stuff under the hood, the results aren’t that opaque.

Here are three different types of 64-player draws that Beijing might have employed:

  1. Performance byes to last week’s semifinalists. This gives a substantial boost to the players receiving byes, and compared to any other format, has a negative effect on top players. Not only are the top four seeds required to play a first-round match, they are a bit more likely to play last week’s semifinalists, since the byes give those players a better chance of advancing.
  2. Byes to the top four seeds. The top four seeds get an obvious boost, and everyone else suffers a bit, as they are that much more likely to face the top four.
  3. No byes: 64 players in the draw instead of 60. The clear winners in this scenario are the players who wouldn’t otherwise make it into the main draw. Unseeded players (excluding Venus) also benefit slightly, as the lack of byes mean that top players are less likely to advance.

Let’s crunch the numbers. For each of the three scenarios, I ran simulations based on the field without knowing how the draw turned out. That is, Kvitova is always seeded second, but she doesn’t always play Sara Errani in the first round. This approach eliminates any biases in the actual draw. To simulate the 64-player field, I added the four top-ranked players who lost in the final round of qualifying.

To compare the effects of each draw type on every player, I calculated “expected points” based on their probability of reaching each round. For instance, if Halep entered the tournament with a 20% chance of winning the event with its 1,000 ranking points, she’d have 200 “expected points,” plus her expected points for the higher probabilities (and lower number of points) of reaching every round in between. It’s simply a way of combining a lot of probabilities into a single easier-to-understand number.

Here are the expected points in each draw scenario (plus the actual Beijing draw) for the top four players, the four players who received performance byes, plus a couple of others (Belinda Bencic and Caroline Wozniacki) who rated particularly highly:

Player               Seed  PerfByes  TopByes  NoByes  Actual  
Simona Halep            1       323      364     330     341  
Petra Kvitova           2       276      323     290     291  
Venus Williams                  247      216     218     279  
Belinda Bencic         11       255      249     268     254  
Garbine Muguruza        5       243      202     210     227  
Angelique Kerber       10       260      224     235     227  
Caroline Wozniacki      8       208      203     205     199  
Flavia Pennetta         3       142      177     144     195  
Agnieszka Radwanska     4       185      233     192     188  
Roberta Vinci          15       120       91      94      90

As expected, the top four seeds are expected to reap far more points when given first-round byes. It’s most noticeable for Pennetta and Radwanska, who would enjoy a 20% boost in expected points if given a first-round bye. Oddly, though, the draw worked out very favorably for Flavia–Elo gave her a 95% chance of beating her first-round opponent Xinyun Han, and her draw steered her relatively clear of other dangerous players in subsequent rounds.

Similarly, the performance byes are worth a 15 to 30% advantage in expected points to the players who receive them. Vinci is the biggest winner here, as we would generally expect from the player most likely to suffer an upset without the bye.

Like Pennetta, Venus was treated very well by the way the draw turned out. The bye already gave her an approximately 15% boost compared to her expectations without a bye, and the draw tacked another 13% onto that. Both the structure of the draw and some luck on draw day made her the event’s third most likely champion, while the other scenarios would have left her in fifth.

All byes–conventional or unconventional–work to the advantage of some players and against others. However they are granted, they tend to work in favor of those who are already successful, whether that success is over the course of a year or a single week.

Performance byes are easy enough to defend: They give successful players a bit more rest between two demanding events, and from the tour’s perspective, they make it a little more likely that last week’s best players won’t pull off of this week’s tourney. And if all byes tend to the make the rich a little richer, at least performance byes open the possibility of benefiting different players than usual.

The Slow but Steady Erosion of the Server’s Advantage

After a couple of weeks of data-driven skepticism, I can finally confirm a bit of tennis’s conventional wisdom. Over the course of a typical match, breaks of serve are a little easier to come by.

This result–based on tens of thousands of matches from the last few years–is similar for both men and women. After about twelve games (total, not service games for each player), a hold is roughly 2% less likely than it was in the first few games of the match. By the 25th game, a hold is approximately 5% less likely than at the beginning of the match.

To control for the vagaries of surface, opponent, and other conditions, I’ve compared each service game to the server’s hold percentage within that match. Only the closest matches are likely to go very long, so it’s important to compare the last games of those matches to games with similarly even opponents.

It seems that this effect is the result of one or both of two factors: server fatigue (which may have more of an effect on results than an equivalent amount of returner fatigue), and the returner’s increasing familiarity with the server. It would be difficult to separate these two–and with this dataset, probably impossible–so for today, let’s stick with the nature of the effect, not its causes.

The following graph shows the relative probability of a hold of serve based on how much of the match (in games) has been played:

Relative hold percentage

I’ve set the hold probability of the first game at 100%, so all other numbers are relative to that. I’ve excluded tiebreaks from these calculations, though I considered them when counting games–that is, the first game of the second set after a tiebreak is considered the 14th game, not the 13th.

The results get a lot noisier starting around the women’s 25th game and the men’s 35th game, for the simple reason that most matches don’t get that far. For example, while the WTA calculations are based on 11,000 matches, only one-third reached the 25th game and less than one-tenth made it to the 31st.

The general downward trend indicates that the fatigue and/or familiarity effect dwarfs the effect of new balls. I have found that in men’s matches, the age of balls has a very small effect on hold percentage, and in women’s matches, it has no effect. In any case, the steady ebb of the server’s advantage is a stronger effect.

It is likely that some players suffer more from fatigue or familiarity than others. Due to the smaller size of the per-player samples, especially beyond the 20th game or so, I’m reluctant to draw any strong conclusions. Still, there are some intriguing numbers for the players for whom the dataset contains the most matches.

Here, I’ve calculated the hold percentage for several top players at various stages of the match, relative to their hold percentage in the first ten games. Thus, a number below 100% indicates less frequent holds, while a number above 100% means more frequent holds:

Player                 Matches  11 to 20  21 to 30  31 to 50  
Tomas Berdych              337     98.5%     98.3%    101.5%  
David Ferrer               330     97.0%     99.4%    102.4%  
Novak Djokovic             325    100.1%    101.8%    101.7%  
Roger Federer              325    100.2%     99.6%    100.4%  
Andy Murray                295     97.7%     98.7%     97.9%  
Rafael Nadal               293     99.2%    100.3%     93.7%  
Jo-Wilfried Tsonga         255    100.4%    100.9%     99.6%  
Philipp Kohlschreiber      252    101.4%     97.9%     96.7%  
John Isner                 251    100.4%    100.4%    100.3%  
                                                              
Player                 Matches  11 to 20  21 to 30  31 to 50  
Kevin Anderson             247    100.0%     98.1%     97.5%  
Richard Gasquet            246     99.1%     98.4%    105.1%  
Gilles Simon               245    100.1%    103.7%     95.0%  
Milos Raonic               238     97.1%     96.1%     96.7%  
Marin Cilic                238     95.4%     97.5%     94.5%  
Fabio Fognini              235    100.4%     99.6%     98.2%  
Kei Nishikori              233    101.8%    104.1%    107.2%  
Grigor Dimitrov            224    100.9%    100.3%     94.6%  
Andreas Seppi              221    106.4%    100.4%    103.1%  
Feliciano Lopez            221     99.2%     99.7%     98.4%  
                                                              
Total                    23326     98.1%     96.1%     95.1%

While John Isner is steady throughout the stages of the match, other big servers such as Milos Raonic and Marin Cilic are less dominant as the match progresses. The players whose hold percentage improves through the match–such as Novak Djokovic and David Ferrer–tend to be those without big serves, so we may be looking at more of an overall fatigue effect in those cases.

The most extreme number in the table is Rafael Nadal‘s relative hold percentage after the 30th game. Perhaps after that much time on court, his opponents finally figure out how to defend against the ad-court slider.

Here are the same calculations for top WTA players:

Player                Matches  11 to 15  16 to 20  21 to 40  
Agnieszka Radwanska       299    101.0%    104.9%     98.0%  
Sara Errani               279     97.7%     91.2%     92.7%  
Caroline Wozniacki        279    103.1%    102.3%    104.9%  
Serena Williams           266    102.8%    102.4%    104.9%  
Angelique Kerber          265    101.9%    103.0%    101.5%  
Samantha Stosur           253     99.2%    105.0%     97.6%  
Carla Suarez Navarro      252    102.2%    101.8%     93.7%  
Petra Kvitova             251     93.9%    100.4%     95.9%  
Roberta Vinci             250     94.2%     97.9%     95.4%  
Ana Ivanovic              241    100.8%    106.0%     95.2%  
Jelena Jankovic           241    102.2%    108.7%     96.4%  
                                                             
Player                Matches  11 to 15  16 to 20  21 to 40  
Maria Sharapova           236    100.1%    105.9%    104.9%  
Victoria Azarenka         228    100.6%    103.7%     97.8%  
Lucie Safarova            227    102.7%    100.5%     94.4%  
Simona Halep              224     89.2%     95.3%    101.7%  
Dominika Cibulkova        210     98.7%     89.9%     99.9%  
Alize Cornet              210     96.2%    102.8%     96.4%  
Andrea Petkovic           194    101.5%    104.2%    107.5%  
Sloane Stephens           185     97.5%     90.1%     88.7%  
Sabine Lisicki            185     97.4%     97.5%     96.6%  
Ekaterina Makarova        185     96.6%    102.8%     92.8%  
Flavia Pennetta           180    105.1%     92.9%    103.9%  
                                                             
Total                   22406     98.6%     97.2%     95.0%

Here is some confirmation that Serena Williams–at least on serve–gets better as the match progresses. Many of the other players with the strongest serve results late in matches are those known for fitness (like Caroline Wozniacki) or steeliness (Maria Sharapova).

Whether the root cause is fatigue or familiarity, most players are less effective on serve as the match progresses. With further research, I hope we’ll be able to better understand the cause and determine whether there are advantages to serving particularly well at certain stages of the match.

The Odds of Successfully Serving Out the Set

Italian translation at settesei.it

Serving for the set is hard … or so they say. Like other familiar tennis conceits, this one is ripe for confirmation bias. Every time we see a player struggle to serve out a set, we’re tempted to comment on the particular challenge he faces. If he doesn’t struggle, we ignore it or, even worse, remark on how he achieved such an unusual feat.

My findings–based on point-by-point data from tens of thousands of matches from the last few seasons–follow a familiar refrain: If there’s an effect, it’s very minor. For many players, and for some substantial subsets of matches, breaks of serve appear to be less likely at these purportedly high-pressure service games of 5-4, 5-3 and the like.

In ATP tour-level matches, holds are almost exactly as common when serving for the set as at other stages of the match. For each match in the dataset, I found each player’s hold percentage for the match. If serving for the set were more difficult than serving in other situations, we would find that those “average” hold percentages would be higher than players’ success rates when serving for the set.

That isn’t the case. Considering over 20,000 “serving-for-the-set” games, players held serve only 0.7% less often than expected–a difference that shows up only once every 143 attempts. The result is the same when we limit the sample to “close” situations, where the server has a one-break advantage.

Only a few players have demonstrated any notable success or lack thereof. Andy Murray holds about 6% more often when serving for the set than his average rate, making him one of only four players (in my pool of 99 players with 1,000 or more service games) to outperform his own average by more than 5%.

On the WTA tour, serving for the set appears to be a bit more difficult. On average, players successfully serve out a set 3.4% less often than their average success rate, a difference that would show up about once every 30 attempts. Seven of the 85 players with 1,000 service games in the dataset were at least 10% less successful in serving-for-the-set situations than their own standard.

Maria Sharapova stands out at the other end of the spectrum, holding serve 3% more often than her average when serving for the set, and 7% more frequently than average when serving for the set with a single-break advantage. She’s one of 30 players for whom I was able to analyze at least 100 single-break opportunities, and the only one of them to exceed expectations by more than 5%.

Given the size of the sample–nearly 20,000 serving-for-the-set attempts, with almost 12,000 of them single-break opportunities–it seems likely that this is a real effect, however small. Strangely, though, the overall finding is different at the lower levels of the women’s game.

For women’s ITF main draw matches, I was able to look at another 30,000 serving-for-the-set attempts, and in these, players were 2.4% more successful than their own average in the match. In close sets, where the server held a one-break edge, the server’s advantage was even greater: 3.5% better than in other games.

If anything, I would have expected players at lower levels to exhibit greater effects in line with the conventional wisdom. If it’s difficult to serve in high-pressure situations, it would make sense if lower-ranked players (who, presumably, have less experience with and/or are less adept in these situations) were not as effective. Yet the opposite appears to be true.

Lower-level averages from the men’s tour don’t shed much light, either. In main draw matches at Challengers, players hold 1.4% less often when serving for the set, and 1.8% less often with a single-break advantage. In futures main draws, they are exactly as successful when serving for the set as they are the rest of the time, regardless of their lead. In all of the samples, there are only a handful of players whose record is 10% better or worse when serving for the set, and a small percentage who over- or underperform by even 5%.

The more specific situations I analyze, the more the evidence piles up that games and points are, for the most part, independent–that is, players are roughly as effective at one score as they are at any other, and it doesn’t matter a great deal what sequence of points or games got them there. There are still plenty of situations that haven’t yet been analyzed, but if the ones that we talk about the most don’t exhibit the strong effects that we think they do, that casts quite a bit of doubt on the likelihood that we’ll find notable effects elsewhere.

If there is any truth to claims like those about the difficulty of serving for the set, perhaps it is the case that the pressure affects both players equally. After all, if a server needs to hold at 5-4, it is equally important for the returner to seize the final break opportunity. Maybe the level of both players drops, something we might be able to determine by analyzing how these points are played.

For now, though, we can conclude that players–regardless of gender or level–serve out the set about as often as they successfully hold at 1-2, or 3-3, or any other particular score.