In a post last week, I presented some data that suggested that servers weaken a bit under the pressure of a tiebreak. It’s not a strong effect, but it’s a consistent one. A possible explanation–that all that time between points gives servers a chance to psych themselves out, yet may not affect returners the same way–would apply almost as much to games toward the business end of a set, such as at 5-5 or 5-6.
In other words, if players don’t serve as well (or they return better) when things get tight, we’d expect to see more breaks toward the end of a set–more breaks than expected at 5-5, but perhaps fewer breaks than expected at 2-2.
This also opens up a possible method for evaluating players, as Carl Bialik has suggested. If someone is losing more sets 5-7 than they are winning 7-5, it may be that they are wilting under the pressure of 5-5 more than the average player. It would make sense if the players who consistently exceed tiebreak expectations also regularly outperform 7-5 expectations as well.
Within the constraints of the ATP’s Matchstats, 7-5 sets are a great way to identify these patterns. While some 6-4 sets end with a break (or a break followed by a set-sealing hold), a 6-4 set doesn’t necessarily end that way. But a 7-5 set must have reached 5-5 before one player took control.
If the hypothesis is correct that players get tighter on serve as the end of the set approaches, we would expect more 7-5 sets in the real world than simulations would imply.
To estimate the number of sets that should end 7-5, we need to take each player’s service points won from each match. With that, we can calculate the probabilities that sets will end at any given score. Repeat the process for every match over a period of time and we get a general idea of how often we should see 7-5 sets.
As it turns out, 7-5 sets should make up about 7.8% of all sets. In fact, 8.8% of sets end 7-5. Not a huge difference, but one that is fairly consistent from year to year. Every year since 1991, where this dataset begins, there have always been more 7-5s than expected. It certainly adds more weight to the claim that the balance of power swings to the returner toward the end of a tight set.
(My set-prediction model doesn’t exactly replicate reality, since players win more games than their service winning percentages predict, in large part because almost all servers are better in either the deuce or ad court, and the variance between them makes it more likely that the player wins a given service game. When applying a crude adjustment for this, the crumbling-server hypothesis looks even better–the more games servers are predicted to win, the fewer predicted 7-5 sets.)
Identifying the unbreakable
This type of discussion must make you wonder: Which players are good as this stuff? If it is true that late-set pressure results in more breaks, it seems obvious that some players are more prone to that pressure, and that other players take advantage of that pressure.
In an ideal world, we’d be able to identify some great 7-5 records, point out some 5-7 records, and have some great new insights into players.
As it is … we might.
As we saw last week with tiebreak analysis, we can’t simply count up a player’s 7-5 sets and compare that total to his 5-7 set losses. Over the last three years, Andy Roddick won more than 55% of his 7-5 and 5-7 sets, but given the players he faced in those sets and their performances in those matches, he should have won 62%.
There are two ways to quantify player accomplishments in this department. The first evaluates how well a player avoids losing 5-7 when he reaches 5-5; the other compares his ability to break for 7-5 against his proneness to being broken for 5-7.
Let’s call the first stat Five-Seven AVoidance, or FSAV. For any player, we first add up the sets that reached 5-5, then count the sets that he won 7-5 or reached a tiebreak. Then we use the general method described above to estimate how many times the player should have reached 5-5, and how many of those times he should have avoided 5-7. Since the beginning of 2010, Kei Nishikori has avoided a 5-7 finish in about 92% of the sets in which he reached 5-5. My model would have expected him to avoid 5-7 only about 84% of the time. (The model expects that most players will avoid 5-7 about 82-90% of the time they reach 5-5.)
From those numbers, we discover that Nishikori lost 5-7 less than half as often as we would have expected him to. No other player comes close to that mark. In everyday language, FSAV approximates how often a player was able to hold serve at 5-5 or 5-6. Important skill, that.
The second stat is more narrowly focused on 5-5 sets that do not reach a tiebreak. Let’s call this one the Seven-Five Outperformance Rate, or SFOR, similar to the TBOR (TieBreak Outperformance Rate) I introduced last week.
Here, instead of comparing 5-7s to all 5-5 sets, we compare 5-7s to 7-5s. In other words: Is the player more likely to break for 7-5 or be broken for 5-7? As with the previous stat, after calculating the simple rate (that is, number of 7-5 sets divided by total number of 7-5 and 5-7 sets), we compare that to the results that the model would have expected the player to post.
Bizarrely enough, our three-year leader in SFOR is Ernests Gulbis, who has won about 73% of his 7-5 and 5-7 sets, compared to the 50% the model expects of him. (It’s even more impressive when compared to the 7% that I personally would have expected from him.)
As the highlighting of Gulbis suggests, these stats probably don’t yet belong in our everyday toolbox. There simply aren’t very many 7-5 sets, even if–as I established above–there are a few more than we would expect. For reference, there are almost twice as many tiebreaks as 7-5s.
And to keep Gulbis in the spotlight, it may be that winning 7-5 sets is more a function of getting to 5-5 when you shouldn’t. Perhaps many of those 7-5s racked up by the Latvian came when he should have put the set away 6-2. Once 5-5 came along, he finally decided to get serious. As Gulbis himself might tell you, it’s anybody’s guess.
Follow the jump for FSAV and SFOR on about 50 or so of the most active players (including all tour-level matches (but excluding Davis Cup) since the beginning of 2010, sorted by FSAV) and decide for yourself.
player FSAV SFOR Kei Nishikori 2.05 1.23 Feliciano Lopez 1.92 1.11 Ernests Gulbis 1.66 1.46 Juan Martin Del Potro 1.59 1.26 Janko Tipsarevic 1.48 1.13 Potito Starace 1.47 1.07 Sergiy Stakhovsky 1.36 1.06 Nicolas Almagro 1.35 1.24 Gael Monfils 1.34 1.26 Thomaz Bellucci 1.32 1.30 Stanislas Wawrinka 1.30 1.20 Gilles Simon 1.26 1.15 Andy Murray 1.22 1.13 Milos Raonic 1.19 1.02 Rafael Nadal 1.06 1.13 Juan Monaco 1.06 1.15 Alexandr Dolgopolov 1.05 1.17 Radek Stepanek 1.04 1.01 John Isner 1.02 1.15 Andreas Seppi 1.00 1.22 Marcos Baghdatis 1.00 1.17 Mikhail Youzhny 0.99 0.97 Jo Wilfried Tsonga 0.99 1.12 Marin Cilic 0.97 1.00 Nikolay Davydenko 0.97 1.15 Albert Montanes 0.96 1.02 Marcel Granollers 0.96 1.16 Florian Mayer 0.93 1.03 Jurgen Melzer 0.92 1.03 Jeremy Chardy 0.91 0.93 Robin Haase 0.91 0.53 Guillermo Garcia Lopez 0.91 0.79 Robin Soderling 0.89 1.15 Denis Istomin 0.89 1.04 Viktor Troicki 0.88 0.94 Pablo Andujar 0.87 0.81 Tomas Berdych 0.87 1.08 Jarkko Nieminen 0.87 1.07 Santiago Giraldo 0.84 0.98 Philipp Petzschner 0.82 1.04 Mardy Fish 0.82 0.90 Victor Hanescu 0.81 0.52 Fabio Fognini 0.80 1.00 Philipp Kohlschreiber 0.77 0.85 Andy Roddick 0.77 0.90 Fernando Verdasco 0.76 0.92 Juan Ignacio Chela 0.76 0.84 Kevin Anderson 0.73 0.84 Roger Federer 0.72 0.98 Xavier Malisse 0.71 0.93 Julien Benneteau 0.71 0.79 David Ferrer 0.70 0.92 Lukasz Kubot 0.70 0.82 Sam Querrey 0.68 0.98 Novak Djokovic 0.66 0.88 Richard Gasquet 0.55 0.69
I like the way you point out the nuances & complexities that make these numbers so weird at the moment. So many paths leading to such different possible destinations!
It reminds me of the time I was writing a blog post about the puzzle that is Andy Roddick – http://craighickmanontennis.blogspot.com/2010/04/return-on-risk.html – and went to look up a particular return of serve stat kept by the ATP. The stat was something like “break point conversion,” and the leader at the time was Evgeny Korolev. This is a guy whose highest ATP ranking was #46 early in 2010, and whose current ranking is #321 – and he was the leader in this category?! It illustrates the nature of stats – some seem consistently useful (e.g. return games won) and some seem nearly freakish (like the above-mentioned stat on converting break points).
Anyway, the kind of analysis you’re attempting here, where so many forking paths are possible, almost seems like a project that begs for online collaborative analysis – e.g. something like http://polymathprojects.org has done. The only thing is, anyone contributing would need to be a tennis buff too, not just a math whiz, so as to have at least a clue as to the right questions to ask.
It also seems like the kind of question that an intelligent computer could hack out an answer to by trial and error, but programming that kind of iterative solving routine would again probably require a team who knew the right way to pose the question.
Yep, this is a very tricky problem. It gets even trickier to go beyond tiebreaks and 5-5 sets (for instance, do servers weaken at 3-3? 4-4?), though I’ll keep trying.
And yep, break point stats can be extremely misleading, because they are so dependent on opportunity. Without even looking at current leaders, it wouldn’t be surprising if some top players, especially top returners did poorly in break points converted — because they earn so many! A game that goes to 6 deuces, regardless of whether it ends in a break or not, will generally result in horrible break-points-converted stats, but it doesn’t reflect poor returning, or else it never would have gotten to the first deuce, let alone the 6th.
I am a great fan of your posts, Jeff!
I am not totally surprised by these results, after giving it some thought. In matches where you have a decided underdog, you would not normally expect a 5-5 set. When there is a 5-5 set, the underdog is probably having a good day and/or the favorite is having a bad day. Assuming that trend continues through the end of the set, it would not be surprising to see underdogs doing better than expected (high FSAV/SFOR results) with favorites suffering the reverse results.
I think it would be interesting to look at the results, in general, for underdogs versus evenly matched players versus favorites to see if there is bias in these measures. It would also be interesting to see the same statistics for tiebreaker results, since a similar argument could be made for tiebreakers, as well.
Finally, I wonder if service order makes a difference in FSAV/SFOR? I think the conventional wisdom is that there is more pressure on the person serving second. It would be interesting to see if the data tells us anything about that.
Hi Bill, thanks for the kind words.
Related to what you are saying, what may be happening is that these are 7-5 sets in otherwise lopsided matches. The estimates of each player’s probability that a 7-5 is reached are based on each player’s performance *during the match*, so if Djokovic wins 7-5 6-0, the stat sheet will say that he dominated, thus forecasting a high likelihood that he would win that set that got to 5-5.
So when Novak wins 5-7 6-2 6-1, he may have had an 80% chance of winning the 7-5, but didn’t — a couple of those really affects these stats. And that’s how it should be — if Novak is good enough (or the opponent is bad enough) that he’s going to pull away 6-2 6-1, why is he getting broken at the end of the first set!? So perhaps these numbers point to some streakiness at 5-5.
I haven’t specifically looked at service order in tiebreak sets, but this post suggests that service order doesn’t matter in tight sets:
http://tennisabstract.com/blog/2012/08/01/serving-first-in-marathon-sets/