What If Jannik Sinner Made More First Serves?

Jim Courier thinks he should:

Among the current top 50, there’s actually a negative correlation between height and first-serve percentage–that is, taller guys make slightly fewer first serves, all else equal–but that doesn’t directly contradict what Courier said. There’s a whole lot that we could investigate in that couple of lines, but let’s stick with the question in the headline.

In the 52 weeks going into the current Miami event, Jannik Sinner made 57.3% of his first serves. That’s the lowest rate of the current top 50, and well below the average of 63%. When he makes his first serve, he wins 74.7% of points–slightly better than average–and on second-serve points, he wins 54.7%, which ranks 11th among the top 50. Altogether, he’s winning 66.2% of service points, again a little bit above top-50 average.

Courier presumably meant that Sinner’s first serve needs to be more reliable, not that he should take something off of it. In the hypothetical, then, he’ll continue to win roughly 75% of first-serve points. He’ll just have more of them.

If Sinner made 65% of his first serves instead of 57.3%, and he continued to win first and second serve points at the same rate, he’d improve his overall winning percentage on service points from 66.2% to 67.7%. That’s equivalent to increasing his hold percentage from 84.9% to 87.1%. (He’s currently holding 83.9% of the time, so he might be a bit unlucky.)

One and a half percentage points–how much does that really matter?

For starters, it would improve his position on the top-50 leaderboard from 24th to 11th. Now, he’s winning service points like Frances Tiafoe and Roberto Bautista Agut. Improved by 1.5%, he’d be in another league entirely, equal to Felix Auger-Aliassime and Taylor Fritz.

Another way of looking at it is within my framework of converting points to ranking places. As a rough rule of thumb, winning one additional point per thousand translates into a improvement of one place on the ranking table. That relationship doesn’t hold at the very top of the rankings, where players are not so tightly packed. But when I first introduced the framework in 2017, the relationship among players ranked 2nd to 10th was that–again, approximately–two points per thousand translated into one place in the rankings.

Back to Sinner. If he won 1.5% more service points, that’s a 0.75% increase overall. (We’re assuming his return game is unchanged.) Call it 0.8%, or eight points per thousand. According to the top-ten version of my rule, that’s worth four spots in the computer rankings.

Sinner is currently ranked 11th on the ATP computer, and after advancing to the Miami semi-finals yesterday, he ranks 9th on the live table. He could head back to Europe as high as 6th if he wins the title. From any one of those positions, a four-place jump would be significant.

Yet the Italian might be better even than that. My Elo ratings place him 4th, behind only Novak Djokovic, Carlos Alcaraz, and Daniil Medvedev. There’s no reliable relationship between points per thousand and ranking places at the very top of the table, but Elo hints at what an elite player Sinner already is. Tack on seven or eight more points per thousand and he might not be the number one player in the world, but he’s right there in the mix.

That is, at least as long as no one else improves even faster. Sinner isn’t alone in his 66.2% rate of service points won. Alcaraz entered Miami with exactly the same number. Sinner has more room to improve his first serve percentage than anyone else at the top of the game, but his rivals will hardly stand around and watch while he does.

Erasing Love-40 Three Times In a Row

During last week’s marathon fourth-rounder at Indian Wells, Daniil Medvedev tucked an unusual feat inside his 6-7, 7-6, 7-5 defeat of Alexander Zverev. Starting with the 12th game of the first set, he recovered from a 0-40 deficit in three consecutive service games.

Voo de Mar noticed:

Peter asked me if this had ever happened before, so here we are. The short answer is: I’m not sure (at least at ATP tour level), because I don’t have the point-by-point sequence for every match. However, I have the sequence for enough matches to confirm that it’s extremely rare.

Theory first

Just falling behind 0-40 is unusual. ATP-level servers win about 65% of points, so a basic model would predict that 0-40 happens in 4.3% of service games. It’s actually more frequent than that–about 5.4%–partly because the tour does not consist of identical servers, and partly because there’s probably some streakiness involved.

Back to theory: “Erasing” a 0-40 deficit means winning three service points after losing the first three. The odds of that particularly six-point sequence–again, assuming the server wins 65% of points–is 1.2%.

The historical record agrees exactly. Across 18,000 tour-level matches from 2010s, I found that the server falls to 0-40 and recovers to deuce exactly 1.2% of the time.

Three in a row is a different story entirely. If there’s a 1% probability of something occurring once, there’s a 0.0001%–literally, one in a million–chance that it will happen three times in a row. On the other hand, there are a lot of matches and a lot of service games. Using some rough assumptions for the number of games in a match and the number of matches per season, my ballpark estimate is that we should see a rarity like this about once in every 10-12 ATP seasons.

The data

Like I said, I don’t have the point-by-point sequence for every match. But I do have it for over 18,000 ATP matches between 2011 and early 2019. (Much of that data, plus equivalent data for women’s tennis, is here.) In that dataset, there was only one instance when a player apparently erased a 0-40 deficit three times in a row: 2011 Kuala Lumpur, where Mischa Zverev managed it against Philipp Petzschner.

Except… I’m not so sure. In 2011, betting sites were just starting to collect and publish point-by-point data, and some of it was approximate. For this particular match, there is a suspicious number of streaks, a sign that the data wasn’t reported precisely. For instance, in all three of the 0-40 rescues, Zverev purportedly won the next five points in a row. It’s possible, but we have to leave a question mark next to this one.

We can, however, broaden the search. 6,800 ATP qualifying matches? No one managed three 0-40 recoveries in a row. 28,000 Challenger matches? Now we’re talking–I found five occasions when a player saved three consecutive 0-40 deficits. The most recent was at the 2016 Tallahassee Challenger, where Donald Young accomplished it in a losing effort against Frances Tiafoe. He won the first two of the games, but in the third, serving to stay in the match, he fought back to deuce only to double fault on match point.

I found another five cases out of over 33,000 Futures-level matches. The most recent, a 2017 match between Altug Celikbilek and Francesco Vilardo, was notable because Celikbilek recovered from 0-40 in the 6th, 8th, and 10th games–and in the 7th game, Vilardo did as well!

It’s important to keep in mind that servers do not win as many points at the lower levels of men’s tennis. (Streakiness might also generate more 0-40 scores as well.) In my 2011-2019 data, servers fell to love-40 5.4% of the time at the ATP main draw level, 5.8% in ATP qualifying, 6.4% at Challengers, and 7.7% at Futures. However, that doesn’t end up generating many more recoveries, since servers are more likely to lose those games before evening the score.

If we dump all of these results together, we get 10 occasions (or 11, if you count the Petzschner match) when a player recovered from 0-40 three times in a row, out of approximately 86,400 total matches. That rate suggests that we should see a feat like Medvedev’s once every three or four years on tour. That’s more frequent than my initial calculation, but still quite rare.

Aryna Sabalenka at One Hundred Percent

Aryna Sabalenka played her first match at Indian Wells on Friday, handily beating Evgeniya Rodina. Sabalenka won the first set 6-1, then took a 3-0 lead in the second. Commentator Mikey Perera noted that Sabalenka’s win probability had reached 100%, though he (correctly!) expressed skepticism with the number.

Win probability has steadily crept in to tennis broadcasts. Often we’re shown pre-match percentages along with the change up to the current moment in the match. The silliness of a 100% mid-match win probability has a pedestrian explanation: The numbers are usually given as integers. For most fans, there’s no important difference between 55.7% and 58%, but in extreme cases, another significant digit would come in handy.

So, was the broadcast algorithm correct?

My Elo-based pre-match forecast set Sabalenka’s chances at 94.8%. To get mid-match predictions, we need more granular stats. Sabalenka has won 65.5% of serve points and 46.7% of return points this year (including the Rodina match), and if we nudge the RPW up to 47%, those components predict a 94.7% chance of a Sabalenka victory–virtually equivalent to the Elo forecast.

Plug those numbers into my win probability model with Rodina serving at 1-6, 0-3, and Sabalenka’s chances of victory are 99.7%. Round to the nearest integer, and sure enough, you get a 100% chance of victory. It might have felt that way for Rodina.

In fact, Sabalenka crossed the “100%” (99.5%) threshold in the previous game. She cleared 99.5% at 2-0, 15-0, slipped back under the line when she fell to deuce, then reclaimed it each of the two times she gained ad-in.

So far, I’ve used a relatively simple model to forecast the remainder of the match. (And it’s certainly sufficient for these purposes.) But if we were putting money on the outcome–especially if the first ten games of the match had gone in a less predictable direction–we’d want to do something more sophisticated. I’ve assumed that from 6-1, 3-0, Sabalenka would play the way we could have predicted before the match. In this case, that’s a sound assumption. But a better method would take into account the results of the match itself up to that point.

Through ten games, Sabalenka was playing better than the initial forecast of 66.5% on serve and 47% on return. Her success rate on serve was a bit worse, at 64.4%, but she was destroying any service advantage of Rodina’s, winning nearly 55% of those points. Had we known before the match that she would play that way, our pre-match forecast would have given Sabalenka a whopping 99.4% chance of victory.

Using that pre-match forecast, our prediction at 6-1, 3-0 would have been an overwhelming 99.97% for the favorite.

As the match progressed, then, we gained more and more information that the in-match performance–whether due to the conditions, the players’ fitness or mood on the day, the matchup, or any number of other factors–would be even more lopsided. Had we taken everything into account at 6-1, 3-0, we would have calculated some mix of 99.7% (based on pre-match numbers) and 99.97% (based on in-match performance). The degree to which we should weight each of those numbers is the tough part. Determining the correct weights is a complicated questions; suffice it to say that the correct answer is somewhere in between the two.

The broadcast algorithm jumped the gun with its 100% win probability, though only a bit. No matter how lopsided a match, anything can happen–but it probably won’t.

The Underserved First Point

Not all points are created equal. Ask around, and you’ll get a variety of opinions as to which points are most important. Break points, obviously, are key. Pundits are fond of 15-30.

Then there’s the first point of the game. It’s been conventional wisdom for a long time that the opening points holds disproportionate weight. In a previous study, I disproved that. Of course it’s valuable to move from 0-0 to 15-0, and no one likes to start a game by dropping to 0-15. But the first point doesn’t have any magical effect on the outcome of the game beyond simply adding to one or the other player’s tally.

Yet here I am, talking about the first point again. While there still isn’t any magic, the first point is going to the returner too often. With a slight change in tactics or focus, this is a rare analytical insight that pros may be able to use to win a few more service games.

Point by point

The balance between the server and returner varies a great deal depending on the point score. In men’s singles matches at the US Open between 2019 and 2021, servers won 63.6% of points in non-tiebreak games. Yet at 40-love, the server won 67.7%, and at ad-out, the server won only 59.6%.

The point scores that generated such extremes hint at what’s going on here. If a game has reached 40-love, the server is probably a good one. It’s not always the case, but if you look at all the 40-love games in a large dataset, you’ll get far more John Isner holds than Benoit Paire holds. The opposite applies to ad-out, a score that Isner rarely faces. Thus, the difference in point-by-point serve percentage isn’t (entirely) because of the point score–it’s because of the servers who get there.

Other differences are more prosaic. On average, servers win more deuce-court points than ad-court points. In the same three-year dataset, the difference was 64.2% to 62.9%. There’s no selection bias component here. The typical ATPer is simply stronger in that direction. Some players–particularly left-handers–break the mold, but most will favor the deuce side. Both Novak Djokovic and Roger Federer, for instance, win nearly two percentage points more often when serving to that court.

Unbiasing

Because scores like 40-love and ad-out aren’t randomly distributed among servers, we need to do a bit more work to figure out which scores really do favor the server. The trick here is to compare each service point to the rest of the server’s points in the same match. A point like 40-love has a ton of Isners and Opelkas in it, so we’ll end up comparing it to a lot of other Isner and Opelka points. And in fact, the average player who reaches 40-love wins 65.0% of their service points and 64.3% in the ad court, two numbers that are well above average.

Working through the same exercise for every point score gives us a list of “actual” serve points won, “expected” serve points won, and differences. The “actual” column tells us what really happened at that score, bias and all; “expected” tells us how often that particular set of players won service points during the entire matches in question; and the difference gives us a first look at where servers are over- or under-performing.

The following table shows these numbers for each point score:

Score  Actual  Expected  Difference  
40-AD   59.6%     61.4%       -1.8%  
0-0     63.3%     64.6%       -1.3%  
15-0    62.7%     63.3%       -0.6%  
40-30   61.6%     62.2%       -0.6%  
15-30   62.3%     62.7%       -0.4%  
30-0    64.7%     65.1%       -0.3%  
40-40   62.6%     62.8%       -0.1%  
0-15    63.2%     63.3%       -0.1%  
                                     
Score  Actual  Expected  Difference  
40-15   64.6%     64.5%        0.0%  
30-15   62.8%     62.7%        0.1%  
AD-40   61.6%     61.4%        0.2%  
30-30   64.0%     63.6%        0.4%  
0-30    65.9%     65.2%        0.8%  
15-15   64.8%     64.0%        0.8%  
30-40   63.6%     62.2%        1.4%  
0-40    66.1%     64.7%        1.4%  
15-40   66.9%     64.5%        2.4%  
40-0    67.7%     64.3%        3.4%

The scores at the top of the table are the ones where we would expect servers to win more points. At the bottom of the list are those where the server seems to overperform.

Some of the results lend themselves to easy narratives. Servers really focus at 0-40 and 15-40, while returners know they have more break chances coming. 40-AD (ad-out) seems like a stressful time to serve, and the numbers back that up. Other results are a bit more baffling–shouldn’t 30-30 and 40-40 be the same, since they are logically equivalent? Why are servers performing so well at 30-40 if they ultimately struggle at 40-AD?

And to today’s topic: What about the first point? It ranks second only to 40-AD in how much the server underperforms, despite no obvious reason why it should lean one way or the other.

Second to none

When we consider a few more factors, this first-point underperformance has an even greater impact.

One useful way to measure the importance of a point is with win probability. Given any point score (or set/game/point score), combined with the likelihood that the server will win any given point, you can calculate the probability of a hold (or a match victory). If we assume that the server wins 64.2% of points, he’ll hold 81.6% of the time, so his win probability at the beginning of the game is 81.6%.

* 64.2% was the rate in non-tiebreak games at the 2021 US Open, while the overall rate for this 2019-21 dataset is a bit lower.

The next concept is volatility. A point’s volatility is determined by how much the result could swing the win probability. By winning the first point, the server’s win probability rises to 89.7%, the figure for such a server at 15-love. If he loses, it falls to 67.2%. The difference–22.5%–tells us how much is at stake in that single point.

In volatility terms, the first point isn’t particularly crucial. A 22.5% swing far outstrips, say, the 9.3% volatility at 30-love, but it pales next to the 76.3% volatility at 30-40. When the server faces break point, one swing of the racket can determine whether win probability drops to zero (because he loses the game), or bounces back north of 50% (because he gets back to deuce).

What the first point of the game gives up in volatility, it wins back in volume. The stakes are never higher than at 40-AD, but at the US Open in the last few years, barely one-fifth of games ever get that far. By contrast, there’s a love-love kickoff in every single game.

By combining volatility and volume with the degree to which servers under- or over-perform, we can put together a top-level view of what players are gaining or losing at each point score.

Multipliers gone wild

In a tour de force of mathematical derring-do, I’m going to take these three numbers and multiply them together.

The “difference” from the previous table tells us how much better or worse players are serving at a specific point score, compared to their overall performance. If two differences are similar, the one that matters more is the one with higher volatility, right? So we multiply by volatility. And all else equal, the more often a situation occurs, the greater its impact on the end result. So we multiply by the number of occurrences in the dataset.

The final tally is volatility * occurrences * difference, cleverly dubbed “V*O*D” in the table below. The product of three percentages is tiny, so I’ve multiplied those figures by 10,000 to make the results easier to read.

Here are the results:

Score  Volatility  Occurrences  Difference  V*O*D  
40-AD       76.3%          22%       -1.8%  -29.9  
0-0         22.5%         100%       -1.3%  -29.2  
15-30       44.9%          34%       -0.4%   -5.8  
15-0        16.5%          50%       -0.6%   -4.9  
40-30       23.8%          26%       -0.6%   -3.6  
40-40       42.5%          43%       -0.1%   -2.6  
0-15        33.2%          50%       -0.1%   -2.3  
30-0         9.3%          27%       -0.3%   -0.9  
                                                   
Score  Volatility  Occurrences  Difference  V*O*D  
40-15        8.5%          24%        0.0%    0.1  
30-15       20.7%          34%        0.1%    0.6  
AD-40       23.8%          22%        0.2%    1.1  
40-0         3.0%          16%        3.4%    1.7  
30-30       42.5%          32%        0.4%    5.9  
0-40        31.4%          16%        1.4%    7.1  
0-30        40.0%          27%        0.8%    8.2  
15-15       29.4%          46%        0.8%   11.0  
30-40       76.3%          25%        1.4%   26.3  
15-40       49.0%          24%        2.4%   28.2

With all factors taken into account, we see that servers are giving up about as much on the first point of the game as they are when faced with nerves at 40-AD. Two point scores also stick out at the other end of the spectrum, where 30-40 puzzlingly continues to be a time when servers find their best stuff.

Exploiting the mundane

The exact V*O*D numbers are far (far!) from natural laws, but when I ran the same algorithm on data from other grand slams, the contours were nearly the same. In the 2017 and 2018 US Opens, for instance, 40-AD and 0-0 were again the standout “underperforming” points, and 0-0 was the one that topped the list.

* I took a rudimentary look at this topic very early in the blog’s history, using data from 2011. 0-0 didn’t stick out to the same degree, but I didn’t control for the deuce/ad difference, as I have today. When accounting for deuce-court strength, 0-0 performance looks relatively worse.

All of which is to say: I can’t explain why this is a thing, but it sure looks like it’s a thing. And if it’s a thing, it looks like an opportunity for savvy players and coaches.

I’m perfectly happy to accept that servers struggle to maintain their focus (and perhaps their ability to surprise) at 40-AD. More importantly, I’m sure that players and coaches are very aware of the necessary mental gymnastics so deep in a game.

On the other hand, there’s no good reason that servers should underperform at the start of every game. In fact, I’d be more ready to accept the idea that servers would have the edge. The opponent hasn’t seen a serve for a few minutes (or more), and the server’s arm is (relatively) fresh. While it’s not a recipe for domination, it sounds like a recipe for a tiny edge that the server can build on.

That’s why I believe there’s something to be exploited here. Perhaps players–or at least some of them–are taking a bit off their first-point first serves, using the opening salvo as a mini-warmup. Maybe they are more willing to hit their second-best serve, or aim to the returner’s stronger side, as a tactical move to set up more effective serves later in the game. As I’ve said, I don’t know why the numbers are turning up this underperformance, but it’s clear there’s a gap to be closed.

There’s no magic in the first point, but there’s an awful lot of value. Players who serve up their best stuff at the beginning of the game are getting an edge that their peers ought to be developing, too.

Ashleigh Barty’s Fully Baked Double Bagel

Not every double bagel is created equal. Today in Melbourne, Ashleigh Barty beat Danka Kovinic without losing a game, dropping only ten points. By contrast, a memorable Stuttgart first-rounder from 2015 saw Sabine Lisicki lose 6-0 6-0 to Zarina Diyas, requiring 88 points and well over an hour to play. Lisicki won 37.5% of total points played that day, while Kovinic snuck off with just 16.7%.

Barty’s performance was among the most dominant in recent WTA history. I have mostly complete match stats for the women’s tour going back to about 2010, and in that time frame, only two main draw double bagels have finished in fewer than 60 points:

Points  Year  Event       Round  Winner     Loser          
57      2017  Hua Hin     R32    Golubic    Wisitwarapron  
59      2019  New Haven   R32    Cepelova   Small          
60      2021  Aus Open    R128   Barty      Kovinic        
60      2019  Madrid      R16    Halep      Kuzmova        
61      2010  Estoril     R32    Garrigues  De Lattre      
62      2017  Bol         R32    Mrdeza     Thombare       
63      2013  Aus Open    R64    Sharapova  Doi            
63      2015  Bastad      R16    Barthel    Zanevska       
64      2015  Toronto     R64    Vinci      Knapp          
64      2017  Tokyo       R32    Krunic     Date           
64      2011  Luxembourg  R32    Garrigues  Kremer         
64      2012  Copenhagen  R32    Cornet     Ejdesgaard     
65      2010  Moscow      R16    Kirilenko  Bondarenko

Today’s drubbing is even a bit more impressive than it looks on that list. Barty lost only 10 points–among the matches listed above, that’s equal to Jana Cepelova, two more than Viktorija Golubic, and fewer than everyone else. Not all 60-pointers are identical: Because Kovinic forced one deuce game today, Barty had to win 50 points instead of the minimum 48. Simona Halep only needed 48 in her 2019 Madrid double bagel, meaning that she lost 12 of the 60 points played that day.

Double bagel probability

There’s a bit of luck involved in winning twelve games in a row, even for a player at the top of her game. Kovinic won 10 points today, so even if she did exactly the same thing in her next match, one can imagine her “bunching” her points differently and putting a game or two on the board. Unlikely, but possible.

For any match, we can take the winner’s rate of service points won and return points won, and then generate the probability that she wins twelve games in a row. I did this exact exercise last January during the ATP Cup when Roberto Bautista Agut handed a 6-0 6-0 loss to Aleksandre Metreveli. Metreveli lasted 97 points, or 61% longer than Kovinic. If Metreveli had continued to play at that level, his chances of losing twelve games in a row would have been a mere 14.8%.

Barty won 88.9% of her service points and 78.8% of her return points against Kovinic today. If she continued at those rates, assuming no unusual streakiness or significantly better or worse performance at certain point scores, she would hold serve 99.8% of the time and break in 97.2% of return games. (By contrast, Bautista Agut’s probabilities were “only” 98.9% and 73.6%.)

The likelihood of a 6-0 6-0 bagel is simply that of six holds and six breaks. For Barty: (99.8% ^ 6) * (97.2% ^ 6), or 83.6%. In other words, the way she was playing today, Ash would score the double bagel five out of six times.

This probability is the number that really tells you how dominant a player was, even if it’s a few levels more complex than counting points and points lost. And by this measure, only Golubic’s great day holds a place on the list ahead of Barty’s. The p(DB) column shows the probability of a double bagel.

p(DB)  Year  Event       Round  Winner          Loser           
88.7%  2017  Hua Hin     R32    Golubic         Wisitwarapron   
83.6%  2021  Aus Open    R128   Barty           Kovinic         
80.0%  2019  New Haven   R32    Cepelova        Small           
76.8%  2019  Madrid      R16    Halep           Kuzmova         
75.4%  2017  Tokyo       R32    Krunic          Date            
68.8%  2011  Luxembourg  R32    Garrigues       Kremer          
66.9%  2010  Estoril     R32    Garrigues       De Lattre       
64.9%  2017  Bastad      R32    Krejcikova      Beck            
64.1%  2017  Bol         R32    Mrdeza          Thombare        
62.0%  2010  Moscow      R16    Kirilenko       Bondarenko      
60.7%  2016  US Open     R128   Suarez Navarro  Pereira         
59.2%  2013  Aus Open    R64    Sharapova       Doi             
59.2%  2018  US Open     R128   Gavrilova       Sorribes Tormo

Gotta love the coincidence here. 13th on this list is a 2018 US Open first-rounder between Daria Gavrilova and Sara Sorribes Tormo. Both players are still going strong (except when Sorribes Tormo was up 6-0 4-0 on Aryna Sabalenka in Ostrava last October), both are in Melbourne, and they drew each other again this week. Gavrilova won again, though not quite as easily. Her reward? A second-round match on Thursday with Ashleigh Barty.

Rethinking Match Results as Probabilities

You don’t have to watch tennis for long before hearing a commentator explain that matches can be decided by the slimmest of margins. It’s common for a match winner to tally only 51% or 52% of the total points played. Dozens of times each year, players go even further, triumphing despite winning fewer than half of points. Novak Djokovic did just that in the 2019 Wimbledon final, claiming only 204 points to Roger Federer’s 218.

It’s right to look at results like Djokovic-Federer and conclude that many matches are decided by slim margins or that performance on certain points is crucial. Indeed, players occasionally win matches while winning as few as 47% of points.

Still, it’s possible to take the “slim margins” claim too far. 51% sounds like a narrow margin, as does 53%. In many endeavors, sporting and otherwise, 55% represents a near-tie, and even 60% or 65% suggests that there isn’t much to separate the two sides. Not so in tennis, especially in the serve-centered men’s game. However it sounds, 60% represents a one-sided contest, and 65% is a blowout verging on embarrassment. In 2019, only three ATP tour matches saw one player win more than 70% of total points.

Answer a different question

For several reasons, total points won is an imperfect measure of one player’s superiority, even in a single match. One flaw is that it is usually stuck in that range between 35% and 65%, incorrectly implying that all tennis matches are relatively close contests. Another drawback is that not all 55% rates (or 51%s, or 62%s) are created equal. The longer the match, the more information we gain about the players. For a specific format, like best-of-three, a longer match usually requires closely-matched players to go to tiebreaks or a third set. But if we want to compare matches across different formats (like best-of-three and best-of-five), the length of the match doesn’t necessarily tell us anything. Best-of-five matches are longer because of the rules, not because of any characteristics of the players.

The solution is to think in terms of probabilities. Given the length of a match, and the percentage of points won by each player, what is the probability that the winner was the better player?

To answer that question, we use the binomial distribution, and consider the likelihood that one player would win as many points as he did if the players were equally matched. If we flipped a fair coin 100 times, we would expect the number of heads to be around 50, but not that it will always be exactly 50. The binomial distribution tells us how often to expect any particular number of heads: 49, 50, or 51 are common, 53 is a bit less common, 55 even less so, 40 or 60 quite uncommon, as so on. For any number of heads, there’s some probability that it is entirely due to chance, and some probability that it occurs because the coin is biased.

Here’s how that relates to a tennis match. We start the match pretending that we know nothing about the players, assuming that they are equal. The number of points is analogous to the number of coin flips–the more points, the more likely the player who wins the most is really better. The number of points won by the victor corresponds to the number of heads. If the winner claims 60% of points, we can be pretty sure that he really is better, just as a tally of 60% heads in 100 or more flips would indicate that the coin is probably biased.

More than just 59%

The binomial distribution helps us convert those intuitions into probabilities. Let’s look at an example. The 2019 Roland Garros final was a fairly one-sided affair. Rafael Nadal took the title, winning 58.6% of total points played (116 of 198) over Dominic Thiem, despite dropping the second set. If Nadal and Thiem were equally matched, the probability that Nadal would win so many points is barely 1%. Thus, we can say that there is a 99% probability that Nadal was–on the day, in those conditions, and so on–the better player.

No surprises there, and there shouldn’t be. Things get more interesting when we alter the length of the match. The two other 2019 ATP finals in which one player won about 58.6% of points were both claimed by Djokovic. In Paris, he won 58.7% of points (61 of 104) against Denis Shapovalov, and in Tokyo, he accounted for 58.3% (56 of 96) in his defeat of John Millman. Because they were best-of-three instead of best-of-five, those victories took about half as long as Nadal’s, so our confidence that Djokovic was the better player–while still high!–shouldn’t be quite as close to 100%. The binomial distribution says that those likelihoods are 95% and 94%, respectively.

The winner of the average tour-level ATP match in 2019 won 55% of total points–the sort of number that sounds close, even as attentive fans know it really isn’t. When we convert every match result into a probability, the average likelihood that the winner was the better player is 80%. The latter number not only makes more intuitive sense–fewer results are clustered in the mid 50s, with numbers spread out from 15% to 100%–but it considers the length of the match, something that old-fashioned total-points-won ignores.

Why does this matter?

You might reasonably think that anyone who cared about quantifying match results already has these intuitions. You already know that 55% is a tidy win, 60% is an easy one, and that the length of the match means those numbers should be treated differently depending on context. Ranking points and prize money are awarded without consideration of this sort of trivia, so what’s the point of looking for an alternative?

I find this potentially valuable as a way to represent margin of victory. It seems logical that any player rating system–such as my Elo ratings–should incorporate margin of victory, because it’s tougher to execute a blowout than it is a narrow win. Put another way, someone who wins 59% of points against Thiem is probably better than someone who wins 51% of points against Thiem, and it would make sense for ratings to reflect that.

Some ratings already incorporate margin of victory, including the one introduced recently by Martin Ingram, which I discussed with him on a recent podcast. But many systems–again, including my Elo ratings–do not. Over the years, I’ve tested all sorts of potential ways to incorporate margin of victory, and have not found any way to consistently improve the predictiveness of the ratings. Maybe this is the one that will work.

Leverage and lottery matches

I’ve already hinted at one limitation to this approach, one that affects most other margin-of-victory metrics. Djokovic won only 48.3% of points in the 2019 Wimbledon final, a match he managed to win by coming up big in more important moments than Federer did. Recasting margin of victory in terms of probabilities gives us more 80% results than 55% results, but it also gives us more 25% results than 48% results. According to this approach, there is only a 24% chance that Djokovic was the better player that day. While that’s a defensible position–remember the 218 to 204 point gap–it’s also a bit uncomfortable.

Using the binomial distribution as I’ve described above, we completely ignore leverage, the notion that some points are more valuable than others. While most players aren’t consistently good or bad in high-leverage situations, many matches are decided entirely by performance in those key moments.

One solution would be to incorporate my concept of Leverage Ratio, which compares the importance of the points won by each player. I’ve further combined Leverage Ratio with Dominance Ratio, a metric closely related to total points won, into a single number I call DR+, or adjusted Dominance Ratio. It’s possible to win a match with a DR below 1.0, which means winning fewer return points than your opponent did, an occurrence that often occurs when total points won is below 50%. But when DR is adjusted for leverage, it’s extremely uncommon for a match winner to end up with a DR+ below 1.0. Djokovic’s DR in the Wimbledon final was 0.87, and his DR+ was 0.97, one of the very few instances in which a winner’s adjusted figure stayed below 1.0.

It would be impossible to fix the binomial distribution approach in the same way I’ve “fixed” DR. We can’t simply multiply 65%, or 80%, or whatever, by Leverage Ratio, and expect to get a sensible result. We might not even be interested in such an approach. Calculating Leverage Ratio requires access to a point-by-point log of the match–not to mention a hefty chunk of win-probability code–which makes it extremely time consuming to compute, even when the necessary data is available.

For now, leverage isn’t something we can fix. It is only something that we can be aware of, as we spot confusing margin-of-victory figures like Djokovic’s 24% from the Wimbledon final.

Rethinking, fast and slow

As with many of the metrics I devise, I don’t really expect wide adoption. If the best application of this approach is to create a component that improves Elo ratings, then that’s a useful step forward, even if it goes no further.

The broader goal is to create metrics that incorporate more of our intuitions. Just because we’ve grown accustomed to the quirks of the tennis scoring system, a universe in which 52% is close and 54% is not, doesn’t mean we can’t do better. Thinking in terms of probabilities takes more effort, but it almost always nets more insight.

Roger Federer Wasn’t Clutch, But He Was Almost Clutch Enough

Italian translation at settesei.it

The stats from the Wimbledon final told a clear story. Over five sets, Roger Federer did most things slightly better than did his opponent, Novak Djokovic. Djokovic claimed a narrow victory because he won more of the most important points, something that doesn’t show up as clearly on the statsheet.

We can add to the traditional stats and quantify that sort of clutch play. A method that goes beyond simply counting break points or thinking back to obviously key moments is to use the leverage metric to assign a value to each point, according to its importance. After every point of the match, we can calculate an updated probability that each player will emerge victorious. A point such as 5-all in a tiebreak has the potential to shift the probability a great deal; 40-15 in the first game of the match does not.

Leverage quantifies that potential. The average point in a best-of-five match has a leverage of about 4%, and the most important points are several times that. Another way of saying that a player is “clutch” is that he is winning a disproportionate number of high-leverage points, even if he underwhelms at low-leverage moments.

Leverage ratio

In my match recap at The Economist, I took that one step further. While Djokovic won fewer points than Federer did, his successes mattered more. The average leverage of Djokovic’s points won was 7.9%, compared to Federer’s 7.2%. We can represent that difference in the form of a leverage ratio (LR), by dividing 7.9% by 7.2%, for a result of 1.1. A ratio of that magnitude is not unusual. In the 700-plus men’s grand slam matches in the Match Charting Project, the average LR of the more clutch player is 1.11. Djokovic’s excellence in key moments was not particularly rare, but in a close match such as the final, it was enough to make the difference.

Recording a leverage ratio above 1.0 is no guarantee of victory. In about 30% of these 700 best-of-five matches, a player came out on top despite winning–on average–less-important points than his opponent did. Some of the instances of low-LR winners border on the comical, such as the 2008 French Open final, in which Rafael Nadal drubbed Federer despite a LR of only 0.77. In blowouts, there just isn’t that much leverage to go around, so the number of points won matters a lot more than their timing. But un-clutch performances often translate to victory even in closer matches. Andy Murray won the 2008 US Open semi-final over Nadal in four sets despite a LR of 0.80, and in a very tight Wimbledon semi-final last year, Kevin Anderson snuck past John Isner with a LR of 0.88.

You don’t need a spreadsheet to recognize that tennis matches are decided by a mix of overall and clutch performance. The numbers I’ve shown you so far don’t advance our understanding much, at least not in a rigorous way. That’s the next step.

DR, meet BLR

Regular users of Tennis Abstract player pages are familiar with Dominance Ratio (DR), a stat invented by Carl Bialik that re-casts total points won. DR is calculated by dividing a player’s rate of return points won by his rate of service points lost (his opponent’s rate of return points won), so the DR for a player who is equal on serve and return is exactly 1.0.

Winners are usually above 1.0 and losers below 1.0. In the Wimbledon final, Djokovic’s DR was 0.87, which is extremely low for a winner, though not unheard of. DR balances the effect of serve performance and return performance (unlike total points won, which can skew in one direction if there are many more serve points than return points, or vice versa) and gives us a single-number summary of overall performance.

But it doesn’t say anything about clutch, except that when a player wins with a low DR, we can infer that he outperformed in the big moments.

To get a similarly balanced view of high-leverage performance, we can adapt leverage ratio to equally weight clutch play on serve and return points. I’ll call that balanced leverage ratio (BLR), which is simply the average of LR on serve points and LR on return points. BLR usually doesn’t differ much from LR, just as we often get the same information from DR that we get from total points won. Djokovic’s Wimbledon final BLR was 1.11, compared to a LR of 1.10. But in cases where a disproportionate number of points occur on one player’s racket, BLR provides a necessary correction.

Leverage-adjusted DR

We can capture leverage-adjusted performance by simply multiplying these two numbers. For example, let’s take Stan Wawrinka’s defeat of Djokovic in the 2016 US Open final. Wawrinka’s DR was 0.90, better than Djokovic at Wimbledon this year but rarely good enough to win. But win he did, thanks to a BLR of 1.33, one of the highest recorded in a major final. The product of Wawrinka’s DR and his BLR–let’s call the result DR+–is 1.20. That number can be interpreted on the same scale as “regular” DR, where 1.2 is often a close victory if not a truly nail-biting one. DR+ combines a measure of how many points a player won with a measure of how well-timed those points were.

Out of 167 men’s slam finals in the Match Charting Project dataset, 14 of the winners emerged triumphant despite a “regular” DR below 1.0. In every case, the winner’s BLR was higher than 1.1. And in 13 of the 14 instances, the strength of the winner’s BLR was enough to “cancel out” the weakness of his DR, in the sense that his DR+ was above 1.0. Here are those matches, sorted by DR+:

Year  Major            Winner              DR   BLR   DR+  
2019  Wimbledon        Novak Djokovic    0.87  1.11  0.97  
1982  Wimbledon        Jimmy Connors     0.88  1.20  1.06  
2001  Wimbledon        Goran Ivanisevic  0.95  1.16  1.10  
2008  Wimbledon        Rafael Nadal      0.98  1.13  1.10  
2009  Australian Open  Rafael Nadal      0.99  1.13  1.12  
1981  Wimbledon        John McEnroe      0.99  1.16  1.15  
1992  Wimbledon        Andre Agassi      0.97  1.19  1.16  
1989  US Open          Boris Becker      0.96  1.22  1.18  
1988  US Open          Mats Wilander     0.98  1.21  1.18  
2015  US Open          Novak Djokovic    0.98  1.21  1.18  
2016  US Open          Stan Wawrinka     0.90  1.33  1.20  
1999  Roland Garros    Andre Agassi      0.98  1.25  1.23  
1990  Roland Garros    Andres Gomez      0.94  1.34  1.26  
1991  Australian Open  Boris Becker      0.99  1.30  1.29

167 slam finals, and Djokovic-Federer XLVIII was the first one in which the player with the lower DR+ ended up the winner. (Some of the unlisted champions had subpar leverage ratios and thus DR+ figures lower than their DRs, but none ended up below the 1.0 mark.) While Federer was weaker in the clutch–notably in tiebreaks and when he held match points–his overall performance in high-leverage situations wasn’t as awful as those few memorable moments would suggest. More often than not, a player who combined Federer’s DR of 1.14 with his BLR of 0.90 would conclude the Wimbledon fortnight dancing with the Ladies’ champion.

Surprisingly, 1-out-of-167 might understate the rarity of a winner with a DR+ below 1.0. Only one other best-of-five match in the Match Charting Project database (out of more than 700 in total) fits the bill. That’s the controversial 2019 Australian Open fourth-rounder between Kei Nishikori and Pablo Carreno Busta. Nishikori won with a 1.06 DR, but his BLR was a relatively weak 0.91, resulting in a DR+ of 0.97. Like the Wimbledon final, that Melbourne clash could have gone either way. Carreno Busta may have been unlucky with more than just the chair umpire’s judgments.

What does it all mean?

We knew that the Wimbledon final was close–now we have more numbers to show us how close it was. We knew that Djokovic played better when it mattered, and now we have more context that indicates how much better he was, which is not a unusually wide margin. Federer has won five of his slams despite title-match BLRs below 1.0, and two others with DRs below 1.14. He’s never won a slam with a DR+ of 1.03 or lower, but then again, there had never before been a major final that DR+ judged to be that close. Roger is no one’s idea of a clutch master, but he isn’t that bad. He just should’ve saved a couple of doses of second-set dominance for more important junctures later on.

If you’re anything like me, you’ll read this far and be left with many more questions. I’ve started looking at several, and hope to write more in this vein soon. Is Federer usually less clutch than average? (Yes.) Is Djokovic that much better? (Yes.) How about Nadal? (Also better.) Is Nadal really better, or do his leverage numbers just look good because important points are more likely to happen in the ad court? (No, he really is better.) Does Djokovic have Federer’s number? (Not really, unless you mean his mobile number. Then yes.) Did everything change after Djokovic hit that return? (No.)

There are many interesting related topics beyond the big three, as well. I started writing about leverage for subsets of matches a few years ago, prompted by another match–the 2016 Wimbledon Federer-Raonic semi-final–in which Roger got outplayed when it mattered. Just as we can look at average leverage for points won and lost, we can also estimate the importance of points in which a player struck an ace, hit a backhand unforced error, or chose to approach the net.

Matches are decided by a combination of overall performance and high-leverage play. Commonly-available stats do a pretty good job at the former, and fail to shine much light on the latter. The clutch part of the equation is often left to the speculation of pundits. As we build out a more complete dataset and have access to more and more point-by-point data (and thus leverage numbers for each point and match), we can close the gap, enabling us to better quantify the degree to which situational performance affects every player’s bottom line.

Did Rafael Nadal Almost Lose a Set to David Ferrer?

Italian translation at settesei.it

In David Ferrer’s final grand slam, the draw gods handed him a doozy of a first-round assignment in Rafael Nadal. Ferrer has struggled all year, and no one seriously expected him to improve on his 6-24 career record against the King of Clay. In the end, he didn’t: Ferrer was forced to retire midway through the second set with a calf injury. But before his final Flushing exit, he gave Rafa a bit of a scare.

Nadal won the first set, 6-3. The second set was a bit messier: Ferrer broke to love in the opening game, Rafa broke him back in the next, and a bit later, Ferrer broke again to take a 3-2 lead. He maintained that one break advantage until he physically couldn’t continue. Leading 4-3 and serving the next game, he was been two holds away from leveling the match.

Does that mean Nadal “almost” lost the set? People on the internet argue about these things, and while I don’t understand why, I do love a good probability question. If it overlaps with semantics (yay sematics!), that’s a bonus.

Let’s forget the word choice for now and reframe the question: Ignoring the injury, what were Ferrer’s chances of winning the set? If we assume that both players were equal, it’s a simple thing to plug into my win probability model and–ta da!–we find that from 4*-3, Ferrer had a roughly 85% chance of winning the set.

But wait: I can already hear the Rafa fans screaming at me, these two players aren’t exactly equal. In the 102 points the Spanish duo played on Monday night, Ferrer won 38% on return and Nadal won 47%. For an entire five-set match, those rates work out to a 93% chance of Rafa winning. Maybe that’s not quite high enough, but it’s in the ballpark. Using those figures, Ferrer’s chance of hanging on to win the second set drop significantly, to 57.5%. When you’re winning barely half of your service points, your odds of securing a pair of holds are worse than a coin flip. Had Ferrer won the set, it’s more likely that he would’ve needed to either break Rafa again or come through in a tiebreak.

That’s a pretty big difference between our two initial estimates. 85% sounds good enough to qualify for “almost” (though one study quantifies the meaning of “almost” at above 90%), but 57.5% does not.

That doesn’t quite settle it, though. The win probability model takes all notions of streakiness out of the equation.  According to the formula, there’s no patches of good or bad play, no dips in motivation, so extra energy to finish off a set, etc. I’m not convinced any of those exist in any systematic manner, but it’s tough to settle the question either way. Therefore, if we have the ability to use data from real-life matches, we should.

And here, we can. Let’s start with Nadal. Going back to late 2011, I was able to identify 69 sets in which Rafa was returning down a break at 4-3. (There are probably more; my point-by-point dataset isn’t exhaustive, but the missing matches are mostly random, so the 69 should be representative of the last several years.) Of those 69, he came back to win 21, almost exactly 30%.

Ferrer has been more solid than Nadal’s opponents. (It helps that Ferrer only had to face Rafa once, while Nadal’s opponents had face him every time.) I found 122 sets in which Ferrer served at 4-3, leading by a break. He went on to win the set 109 of those times, or about 89%.

The 89% figure is definitely too high for our purposes: Not only was Ferrer a better player, on average, between 2012 and today, than he is now, but he also had the benefit of facing weaker opponents than Nadal in almost all of those 122 sets. 89%–not far from the theoretical 85% we started with–is a grossly optimistic upper limit.

Even if we take the average of Nadal’s and Ferrer’s real-life results–roughly 90% conversions for Ferru and 70% for Rafa’s opponents–80% is still overshooting the mark. As we’ve established, Ferrer’s numbers refer to a stronger version of the Spaniard, while Rafa is still near the level of his last half-decade. Even 80%, then, is overstating the chances that Nadal would’ve lost a set.

That leaves us with a range between 57%, which assumes Nadal would keep winning nearly half of Ferrer’s service points, and 80%, which is based on the experience of both players over the last several years. Ultimately, any final figure comes down to what we think about Ferrer’s level right now–not as good as it was even a couple of years ago, but at the same time, good enough to come within two games of taking a set from the top-ranked player in the world.

It would take a lot more work to come up with a more precise estimate, and even then, we’d still be stuck not only trying to establish Ferrer’s current ability level, but also his ability level in that set. Just as the word “almost” refers to a range of probabilities, I’m happy to call it a day with my own range. Taking all of these calculations together, we might settle on a narrower field of, say, 65-70%, or about two in three. There’s a good chance a healthy Ferrer would have taken that set from his long-time tormentor, but it was far from a sure thing … or even, given the usual meaning of the word, an “almost” sure thing.

Measuring a Season’s Worth of Luck

In Toronto last week, Stefanos Tsitsipas was either very clutch, very lucky, or both. Against Alexander Zverev in Friday’s quarter-final, he won fewer than half of all points, claiming only 56.7% of his service points, compared to Zverev’s 61.2%. The next day, beating Kevin Anderson in the semi-final in a third-set tiebreak, he again failed to win half of total points, holding 69.9% of his service points against Anderson’s 75.5%.

Whether the Greek prospect played his best on the big points or benefited from a hefty dose of fortune, this isn’t sustainable. Running those serve- and return-points-won (SPW and RPW) numbers through my win probability model, we find that–if you take luck and clutch performance out of the mix–Tsitsipas had a 27.8% chance of beating Zverev and a 26.5% chance of beating Anderson. These two contests–perhaps the two days that have defined the youngster’s career up to this point–are the very definition of “lottery matches.” They could’ve gone either way, and over a long enough period of time, they’ll probably even out.

Or will they? Are some players more likely to come out on top in these tight matches? Are they consistently–dare I say it–clutch? Using this relatively simple approach of converting single-match SPW and RPW rates into win probabilities, we can determine which players are winning more or less often than they “should,” and whether it’s a skill that some players consistently display.

Odds in the lottery

Let’s start with some examples. When one player wins more than 55% of points, he is virtually guaranteed to win the match. Even at 53%, his chances are extremely good. Still, a lot of matches–particularly best-of-threes on fast surfaces–end up in the range between 50% and 53%, and that’s what most interesting from this perspective.

Here are Tsitsipas’s last 16 matches, along with his SPW and RPW rates and the implied win probability for each:

Tournament  Round  Result  Opponent     SPW    RPW  WinProb  
Toronto     F      L       Nadal      62.9%  21.1%       3%  
Toronto     SF     W       Anderson   69.9%  24.5%      27%  
Toronto     QF     W       A Zverev   56.7%  38.8%      28%  
Toronto     R16    W       Djokovic   77.2%  32.0%      85%  
Toronto     R32    W       Thiem      83.3%  30.2%      93%  
Toronto     R64    W       Dzumhur    82.8%  35.0%      98%  
Washington  SF     L       A Zverev   54.7%  25.5%       1%  
Washington  QF     W       Goffin     71.2%  32.7%      67%  
Washington  R16    W       Duckworth  80.0%  37.5%      98%  
Washington  R32    W       Donaldson  59.5%  45.5%      74%  
Wimbledon   R16    L       Isner      72.5%  18.0%      10%  
Wimbledon   R32    W       Fabbiano   64.0%  55.9%     100%  
Wimbledon   R64    W       Donaldson  70.1%  40.9%      95%  
Wimbledon   R128   W       Barrere    71.5%  39.0%      94%  
Halle       R16    L       Kudla      59.7%  28.8%       8%  
Halle       R32    W       Pouille    78.3%  42.9%      99%

More than half of the matches are at least 90% or no more than 10%. But that leaves plenty of room for luck in the remaining matches. Thanks in large part to his last two victories, the win probability numbers add up to only 9.8 wins, compared to his actual record of 12-4. All four losses were rather one-sided, but in addition to the Toronto matches against Zverev and Anderson, his wins against David Goffin in Washington and, to a lesser extent, Novak Djokovic in Toronto, were far from sure things.

In the last two months, Stefanos has indeed been quite clutch, or quite lucky.

Season-wide views

When we expand our perspective to the entire 2018 season, however, the story changes a bit. In 48 tour-level matches through last week’s play (excluding retirements), Tsitsipas has gone 29-19. The same win probability algorithm indicates that he “should” have won 27.4 matches–a difference of 1.6 matches, or about five percent, which is less than the gap we saw in his last 16. In other words, for the first two-thirds of the season, his results were either unlucky or un-clutch, if only slightly. At the very least, the aggregate season numbers are less dramatic than his recent four-event run.

For two-thirds of a season, a five percent gap between actual wins and win-probability “expected” wins isn’t that big. For players with at least 30 completed tour-level matches this season, the magnitude of the clutch/luck effect extends from a 20% bonus (for Pierre Hugues Herbert) to a 20% penalty (for Sam Querrey, which he reduced a bit by beating John Isner in Cincinnati on Monday despite winning less than 49% of total points). Here are the ten extremes at each end, of the 59 ATPers who have reached the threshold so far in 2018:

Player                 Matches  Wins  Exp Wins  Ratio  
Pierre Hugues Herbert       30    16      13.2   1.22  
Nikoloz Basilashvili        34    17      14.0   1.21  
Frances Tiafoe              39    24      20.0   1.20  
Evgeny Donskoy              30    13      10.9   1.19  
Grigor Dimitrov             34    20      17.1   1.17  
Lucas Pouille               31    16      13.7   1.17  
Gael Monfils                34    21      18.3   1.15  
Daniil Medvedev             34    18      15.8   1.14  
Marco Cecchinato            33    19      16.7   1.14  
Maximilian Marterer         32    17      15.2   1.12  
…                                                      
Leonardo Mayer              37    19      20.1   0.95  
Guido Pella                 37    20      21.2   0.95  
Marin Cilic                 38    27      28.8   0.94  
Novak Djokovic              37    27      29.3   0.92  
Marton Fucsovics            30    16      17.5   0.92  
Joao Sousa                  36    18      19.8   0.91  
Dusan Lajovic               34    17      18.7   0.91  
Fernando Verdasco           43    22      24.5   0.90  
Mischa Zverev               39    18      20.7   0.87  
Sam Querrey                 30    15      18.8   0.80

A difference of three or four wins, as many of these players display between their actual and expected win totals, is more than enough to affect their standing in the rankings. The degree to which it matters depends enormously on which matches they win or lose, as Tsitsipas’s semi-final defeat of Anderson has a much greater impact on his point total than, say, Querrey’s narrow victory over Isner does for his. But in general, the guys at the top of this list are ones who have seen unexpected ranking boosts this season, while some of the guys at the bottom have gone the other way.

The last full season

Let’s take a look at an entire season’s worth of results. Last year, a few players–minimum 40 completed tour-level matches–managed at least a 20% luck/clutch bonus, but with the surprising exception of Daniil Medvedev, none of them have repeated the feat so far in 2018:

Player                 Matches  Wins  Exp Wins  Ratio  
Donald Young                43    21      16.2   1.30  
Fabio Fognini               58    35      28.5   1.23  
Jack Sock                   55    36      29.8   1.21  
Jiri Vesely                 45    22      19.3   1.14  
Daniil Medvedev             43    22      19.7   1.11  
John Isner                  57    36      32.3   1.11  
Damir Dzumhur               56    33      29.7   1.11  
Gilles Muller               48    30      27.1   1.11  
Alexander Zverev            74    53      48.1   1.10  
Juan Martin del Potro       53    37      33.6   1.10

A few of these players have had solid seasons, but posting a good luck/clutch number in 2017 is hardly a guaranteed, as the likes of Donald Young, Jack Sock, and Jiri Vesely can attest. Here is the same list, with 2018 luck/clutch ratios shown alongside last year’s figures:

Player                 2017 Ratio  2018 Ratio     
Donald Young                 1.30        0.89  *  
Fabio Fognini                1.23         1.1     
Jack Sock                    1.21        0.68  *  
Jiri Vesely                  1.14        1.08  *  
Daniil Medvedev              1.11        1.14     
John Isner                   1.11        0.96     
Damir Dzumhur                1.11        1.01     
Gilles Muller                1.11        0.84  *  
Alexander Zverev             1.10        1.06     
Juan Martin del Potro        1.10        1.07

* fewer than 30 completed tour-level matches

The average luck/clutch ratio of these ten players has fallen to a bit below 1.0.

Unsustainable luck

You can probably see where this is going. I generated full-season numbers for each year from 2008 to 2017, and identified those players who appeared in the lists for adjacent pairs of seasons. If luck/clutch ratio is a skill–that is, if it’s more clutch than luck–guys who post good numbers will tend to do so the following year, and those who post lower numbers will be more likely to remain low.

Across 325 pairs of player-seasons, that’s not what happened. There is almost no relationship between one year of luck/clutch ratio and the next. The r^2 value–a measure of correlation–is 0.07, meaning that the year-to-year numbers are close to random.

Across sports, analysts have found plenty of similar results, and they are often quick to pronounce that “clutch doesn’t exist,” which leads to predictable rejoinders from the laity that “of course it does,” and so on. It’s boring, and I’m not particularly interested in that debate. What this specific finding shows is:

This type of luck, defined as winning more matches than implied by a player’s SPW and RPW in each match, is not sustainable.

What Tsitsipas accomplished last weekend in Toronto was “clutch” by almost any definition. What this finding demonstrates is that a few such performances–or even a season’s worth of them–doesn’t make it any more likely that he’ll do the same next year. Or, another possibility is that the players who stick at the top level of professional tennis are all clutch in this sense, so while Tsitsipas might be quite mentally strong in key moments, he’ll often run up against players who have similar mental skills, and he won’t be able to consistently win these close matches.

If Stefanos is able to maintain a ranking in the top 20, which seems plausible, he’ll probably need to win more serve and return points than he has so far. Fortunately for him, he’s still almost eight years younger than his typical peer, so he has plenty of time to improve. The occasional lottery matches that tilt his way will need to be mere bonuses, not the linchpin of his strategy to reach the top.

Simona Halep and Recoveries From Match Point Down

Italian translation at settesei.it

In yesterday’s French Open quarterfinals, Elina Svitolina held a commanding lead over Simona Halep, up a set and 5-1. Depending on what numbers you plug into the formula, Svitolina’s chance of winning the match at that stage was somewhere between 97% and 99%. Halep fought back to 5-5, and in the second-set tiebreak, Svitolina earned a match point at 6-5. Halep recovered again, won the breaker, and then cruised to a 6-0 victory in the third set.

It’s easy to fit a narrative to that sequence of events: After losing two leads, Svitolina was dispirited, and Halep was all but guaranteed a third-set victory. Maybe. It’s impossible to test that sort of thing on the evidence of a single match, but this is hardly the first time a player has failed to convert match point and needed to start fresh in a new set.

Even without a match point saved, the player who wins the second set has a small advantage going into the decider. In the last six-plus years of women’s Slam matches, the player who won the second set went on to win 51.3% of third sets. On the other hand, if the second set was a tiebreak, the winner of the second set won the decider only 43.7% of the time. Though it sounds contradictory at first, consider what we know about such sets. The second-set winner just barely claimed her set (in the tiebreak), while usually, her opponent took the first set more decisively. Momentum helps a little, but it can’t overcome much of a difference in skill level.

Let’s dig into the specific cases of second-set match points saved. Thanks to the data behind IBM’s Pointstream on Grand Slam websites, we have the point-by-point sequence for most Slam singles matches going back to 2011. (The missing matches are usually those on non-Hawkeye courts and a few small courts at Roland Garros.) That’s over 2,600 women’s singles matches. In just over 1,700 of them, one of the two players earned a match point in the second set. Over 97% of the time, that player converted–needing an average of 1.7 match points to do so–and avoiding playing a third set.

That leaves 45 matches in which one player held a match point in the second set, failed to finish the job, and was forced to play a third set. It’s a limited sample, and it doesn’t wholeheartedly support the third-set-collapse narrative suggested above. 60% of the time–27 of the 45 matches–the player who failed to convert match point in the second set, like Svitolina did, went on to lose the third set. The third set was often lopsided: 5 of the 27 were bagels (including yesterday’s match), and the average score was 6-2. None of the third sets went beyond 6-4.

The other 18 matches–the 40% of the time in which the player with the second-set match point bounced back to win the third set–featured rather one-way deciders, as well. In those, the third-set loser managed an average of only 2.3 games, also never doing better than 6-4.

This is a small sample, so it’s unwise to conclude that this 60/40 margin is anything close to an iron law of tennis. That said, it does provide some evidence that players don’t necessarily collapse after failing to convert a straight-sets win at match point. What happened to Svitolina yesterday is far from certain to happen next time.