The Unaceables

Last night, Florian Mayer solved the John Isner serve, breaking the American three times en route to a straight-set victory.  Mayer is known as a tricky opponent, but not as a particularly good returner.  He had never played Isner before, though he beat Ivo Karlovic in Miami last year.

One element of his success is that he got his racquet on the Isner serve.  Over the last 52 weeks, Isner has amassed a 17.1% ace rate, meaning that about one in six of his serves are untouchable.  Last night, he barely managed 10%, as Mayer allowed him only six aces.

We might wonder: Is this is a skill of Mayer’s that we’ve failed to notice before?  At first glance, it doesn’t appear to be.  While Mayer often holds his opponents to low ace numbers, he’s had some horrible performances in that department, allowing Feliciano Lopez a 20.4% ace rate in Shanghai last year, Thomaz Bellucci 15.5% in Madrid on clay, and while playing injured, he ignominiously allowed Ivo Karlovic a 50% ace rate at last year’s Cincinnati Masters.

We can answer this question not just for Mayer, but for every regular on the ATP tour.  While some servers hit far more aces than others, ace rate is influenced by both the server and the returner.  Mayer himself is a good example.  In the last 52 weeks, he’s had eight matches in which at least one in ten serves went for an ace.  But in five other matches, he didn’t hit a single one!  Some of the variation is due to good and bad serving performances, but a substantial part can be explained by the man on the other side of the net.

As  it turns out, last night was an aberration for the German.  Mayer is below-average at ace prevention, allowing 8% more aces than an average player, ranking 80th among the 139 active players whose results I analyzed.

I looked at every 2011 and 2012 match, using only those matches in which both players racked up 10 matches in the last fifteen months.  After calculating each player’s ace rate, I generated an “expected” number of aces for each returner.  Simply tallying how many aces a player allowed isn’t good enough–this way, we adjust for the quality of the server.

Mayer, for instance, played 70 matches in that span against opponents who also played at least 10 matches.  (I excluded guys who played fewer than 10 because their ace rate in such a small number of matches may say more about their opponents than themselves.)  In his 4812 return points, he allowed 345 aces.  But based on the serving abilities of his opponents, he should have allowed only 321.  Those numbers will look a little better after last night, but not enough to move him up very much in the rankings.

By contrast, the best returners get their racquets on just about everything.  Atop the list is Gael Monfils, who allows barely half the aces that we would expect him to.  The top eight returners all reduce expected ace rates by at least a third.

In the table below, I’ve shown these stats for the ten players who appear to be the best at avoiding aces, along with 20 other players of interest.

Player                 Rank  Matches  vAce%  expAce%    Diff  
Gael Monfils              1       62   3.5%     6.8%    -48%  
Benoit Paire              2       23   3.8%     6.3%    -40%  
Andy Murray               3       81   4.4%     7.3%    -39%  
Stanislas Wawrinka        4       61   4.2%     7.0%    -39%  
Cedrik Marcel Stebe       5       12   3.2%     5.2%    -38%  
Viktor Troicki            6       70   4.3%     7.0%    -38%  
Gilles Simon              7       77   4.7%     7.3%    -36%  
David Ferrer              8       90   5.1%     7.8%    -35%  
Carlos Berlocq            9       53   4.7%     7.0%    -32%  
Mardy Fish               10       71   5.7%     8.3%    -31%  

Jo Wilfried Tsonga       14       89   5.7%     7.9%    -28%  
Roger Federer            20       92   6.0%     7.9%    -24%  
Novak Djokovic           22       89   6.4%     8.4%    -24%  
Kei Nishikori            32       63   5.8%     7.0%    -17%  
Rafael Nadal             34       91   7.4%     8.8%    -16%  
Nikolay Davydenko        38       60   5.8%     6.7%    -14%  
Sam Querrey              39       35   6.7%     7.8%    -14%  
Milos Raonic             40       60   6.7%     7.6%    -12%  
Kevin Anderson           53       74   7.5%     8.0%     -6%  
John Isner               59       68   7.6%     7.8%     -2%  

Radek Stepanek           73       62   8.6%     8.0%      6%  
Lukasz Kubot             74       44   8.5%     8.0%      7%  
Ivo Karlovic             78       45   7.9%     7.3%      7%  
Juan Martin Del Potro    81       84   8.8%     8.1%      9%  
Tomas Berdych            91       87   8.5%     7.6%     12%  
David Nalbandian        102       43   9.4%     7.9%     20%  
Arnaud Clement          120       17   9.3%     7.2%     29%  
Andy Roddick            130       55  11.8%     8.3%     42%  
Bernard Tomic           135       38  12.8%     8.5%     50%  
Olivier Rochus          139       36  14.7%     7.2%    103%

Before we go anointing Monfils and Benoit Paire the greatest returners in the game, it’s important to remember the serious limitations of the ace stat.  Much more important is getting the return in play.  But except for Grand Slam matches, we don’t have those numbers. In the meantime, we can use ace rate and return points won as proxies for return skills.

Who Can Stop John Isner?

Last week, John Isner beat world number one Novak Djokovic.  Earlier this year, the victim was Roger Federer.  At least year’s French Open, Rafael Nadal had to go to five sets to eliminate the big man.  Between Isner’s massive serve and the general improvement in his game, it seems that he can beat anybody.

To beat big John, you need either a strong return game or solid tiebreaker skills.  Ideally, you’d have both.  (The only alternatives are to catch him on an off-day or to play him on a slow clay court.)  Let’s take a look at how opponents have fared against the Isner serve over the course of his career.

One surprising indicator of return prowess is ace-rate-against.  We tend to think of ace rate as a function only of the server’s ability, perhaps coupled with surface speed.   But returner’s have plenty to say about it, too.  Simply looking at Isner’s 17 tour-level matches this year, we see a remarkable range of ace rates, from 36.6% of points against Gilles Muller in Memphis down to 5.6% against Federer in Indian Wells.  Surface plays a role, as do a variety of other factors (maybe Isner was tired after beating Djokovic in the semifinal last week), but some players are considerably better than others at getting the ball back in play.

A thorough look at that phenomenon is a subject for another day.  There’s plenty to do simply comparing performances against Isner.  As I’ve noted before, a big serve doesn’t necessarily make a player more unpredictable, though of course such a weapon might make him a better player.

63 players have faced Isner at least twice in tour-level events.  Of those, the most effective has been Lleyton Hewitt, holding Isner’s ace rate under 10% and winning almost half of Isner’s serve points.  However, the most recent of those two matches was almost two years ago.  Still, it’s not surprising to see a world-class counterpuncher atop this list–Hewitt limits aces and service holds against just about everybody.

We find more of the same near the top of the list, with Juan Ignacio Chela, Gilles Simon, and Nikolay Davydenko all in the top 10, ranked by the rate of return points won.  Height might also help in handling the physics-defying bounces of the Isner serve: both Tomas Berdych and Juan Martin Del Potro are among the top 15, though some other tall guys (Kevin Anderson and Ivo Karlovic are shown below) have generally weak return games, so the argument doesn’t seem to apply to them.

The unexpected contrast on this list is to find Nadal several spots below Djokovic, Federer, and Andy Murray.  Nadal allows about the same ace rate as Djokovic and Murray, but he doesn’t perform as well on the balls he gets back in play.  One popular theory is that because of his height, Isner is able to neutralize some of Rafa’s spin.  Regardless of the reason why, it’s even more unexpected to see Rafa so far down the list, since two of the three Nadal-Isner matches have taken place on clay.

Here are some of the raw results for players who have faced Isner two or more times.  I’ve shown the 20 opponents who have won the most return points, along with ten other notable players, whose ranks (out of 63) are shown in parenthesis.

Opponent                 SvPts  Matches   Ace%  SvPtsWon  
Lleyton Hewitt             124        2   8.9%     53.2%  
Tomas Berdych              300        3  11.7%     57.7%  
Thiemo De Bakker*          165        2   9.1%     60.0%  
Mikhail Youzhny            258        2  16.3%     61.6%  
Juan Ignacio Chela         269        3   6.7%     62.1%  
Novak Djokovic             191        2  14.1%     62.3%  
Andy Murray                224        2  13.8%     62.5%  
Roger Federer              243        3  11.5%     63.0%  
Gilles Simon               244        2  15.2%     63.9%  
Nikolay Davydenko          248        3  18.5%     64.1%  

David Ferrer               326        4  14.7%     64.4%  
Viktor Troicki             234        3   9.0%     64.5%  
Juan Martin Del Potro      201        3  16.4%     64.7%  
Robin Haase                233        2  15.0%     64.8%  
Rafael Nadal               336        3  13.7%     64.9%  
Richard Gasquet            221        2  19.0%     65.2%  
Marat Safin                115        2  13.9%     65.2%  
Mardy Fish                 424        4  13.4%     65.6%  
David Nalbandian           259        2  19.7%     65.6%  
Feliciano Lopez            207        2  19.3%     66.2%  

(22) Jurgen Melzer         177        2  16.4%     66.7%  
(25) Fernando Gonzalez     185        2  16.8%     68.6%  
(27) Gael Monfils          651        6  15.4%     69.1%  
(29) Andy Roddick          466        5  20.0%     69.5%  
(32) Jo Wilfried Tsonga    227        2  13.2%     70.5%  
(40) Kevin Anderson        443        6  16.9%     71.6%  
(41) Ivo Karlovic          289        3  15.9%     71.6%  
(46) Lukasz Kubot          167        2  21.0%     73.1%  
(57) Alex Bogomolov Jr     221        3  23.1%     77.4%  
(63) Andrey Golubev        127        2   9.4%     84.3%

(De Bakker gets an asterisk because one of his two matches immediately followed Isner-Mahut, and John was playing injured.)

An interesting avenue for further research is whether return quality against Isner differs much from return quality against players in general.  Sure, Isner wins more points on serve and hits more aces, but looking at the list above, it doesn’t seem to differ much from a ranking of the game’s best returners.  For all of his uniqueness, he’s simply one very big server in a game full of big servers.  As he goes deeper in more tournaments, perhaps we’ll gain a better grasp of what players need to do to stop him.

Top Four Domination

Every time the big four fills up all four spots in the semifinals, we hear the same trivia–how rarely the top four seeds all reach the semifinals; how often this particular group of four has done it, and so on.  There’s no doubt that the current big four has dominated men’s tennis in a way that has rarely been seen before.

Words like “domination” aren’t very easy to quantify, which is why commentators fall back on those few bits of trivia.  We can take a closer look to determine whether the current big four stands out as much as we think it does.

Won-loss record

Last year, the big four played 251 tour-level matches (not counting Davis Cup) against everybody else.  They won 228 of them, for a winning percentage of 90.8%.  My database goes back to 1991, and there hasn’t been a year in that time frame where the top four players did any better.

(For today’s purposes, each year’s top four are defined as the four men at the top of the year-end rankings.  All numbers exclude Davis Cup and go back to 1991.)

In fact, four of the five best W-L records have come since 2004.  2008 and 2009, when the current top four was already reigning, are ranked 3rd and 4th.  (The second best season for the top four, by this measure, was 2005, when Andy Roddick and Lleyton Hewitt complemented Roger Federer and Rafael Nadal.)

Slam performance

What really matters are the majors, right?  Last year, the big four played 82 matches against everybody else at the slams, and won 80 of them, for a jaw-dropping 97.6% winning percentage.  You might guess that it, as well, is the best in the last 20 years.

In fact, the second and third best top-four slam performances came in 2007 and 2008–each one including Federer, Nadal, and Novak Djokovic.  (In 2007, Nikolay Davydenko was the year-end number four.)  Both of those years, the top four lost only four grand slam matches to others.

Masters performance

The majors give us a small (though important) sample; the masters series offers more tournaments with similar high-quality fields.  Largely due to Andy Murray‘s dreadful March, this is where the 2011 foursome falters a bit. Their record against everybody else of 90-13 is “only” third-best of the last twenty years.

But wait–the top masters series record was in 2009, of course with the same top four.  And the second-best masters series record was in 2005, when Federer and Nadal ruled the world.

Beating the rest of the top 10

It’s no shock when the top four cruise through the early rounds of tournaments.  What makes the current top four special is the way they regularly shut everyone else out of the last rounds, defeating excellent players such as Jo-Wilfried Tsonga, Tomas Berdych, and Juan Martin Del Potro.

Last year, the top four went 34-12 (73.9%) against the rest of the year-end top 10.  That’s fourth-best of the last twenty years.  The standout season, once again, was 2005, when Federer, Nadal, Roddick, and Hewitt went 30-4 (!) against the next six guys in the rankings.  In both 2004 and 2006, the top four won exactly three-quarters of their matches against five through ten, just beating out last year’s top four.

To put these numbers in perspective, it is by no means a foregone conclusion that the top four beat up on the next six.  In 1991, the top four of Edberg, Courier, Becker, and Stich actually posted a losing record against guys ranked five through ten.  In both 1996 and 2000, the record was an even .500.

The bigger picture

Of course, there’s more to domination than performance in a single year.  Much of the current big four’s reputation stems from their longevity atop the rankings, and looking at single years ignores that.

But as we’ve seen, there’s no need to look at more than one season.  The big four was, in 2011, one of the most dominating quartets of the last 20 years by several measures, and according to two such measures, they were the most successful top four in recent memory.

Why? (In brief)

Here are three theories that might explain why the big four has so distanced itself from the pack:

  1. These four guys are historically good.
  2. The rest of the field these days is not that good.  Or, at least, they are overawed by the big four.
  3. Court speeds have become more uniform, meaning that top players win all year round, instead of a few specialists racking up big points for only a couple months.

The first two are possible.  Certainly, Federer and Nadal are historically good, and Djokovic’s 2011 season was astounding.  I doubt the rest of the pack is to blame–they seem plenty good to me, even if few of them are that good very much of the time.

I’m tempted by the third theory.  As recently as 2003, there was almost always one clay-court specialist in the year-end top four–Juan Carlos Ferrero, Gustavo Kuerten, Sergei Bruguera.  At the same time, guys like Pete Sampras, Pat Rafter, and Goran Ivanisevic rarely made a dent on clay.

Thus, no matter how many slams Sampras won, or how many clay titles Kuerten took, the top four just weren’t dominant year round.  The idea that the same four players would reach the quarters, or even semis of every slam was borderline ridiculous.  Now, it’s almost expected.

Of course, we can argue about the causes of this as well.  Are the top four successful on all surfaces because the surfaces are more uniform?  Because they are historically good?  Because the game (or its equipment) has changed in such a way to make surface differences less meaningful? That’s a subject for another day.

The Non-Threatening Dr. Ivo

The perception in tennis is that some players are always dark horses, guys who on any given day might play well above their ranking. Often, these players have “top ten talent” coupled with mental lapses–think Gael Monfils, Marcos Baghdatis, Thomaz Bellucci, Philipp Kohlschreiber. Their rankings sag because of brainless losses (Monfils to Lukasz Kubot at Wimbledon, Baghdatis to somebody every third week), but they occasionally flash their brilliance with a surprising result.

Put it together, and you have a dark horse. There’s a special sort of dark horse upon whom everyone seems to agree: the freakishly tall ace machine. Rob Koenig sounds sensible tweeting about Roger Federer‘s third round match against Ivo Karlovic: “Karlovic v Fed?? Even though Fed has a good record against him, he’s not a guy you wanna see on your side of the draw.” That’s the official line before just about every match Ivo or John Isner plays. The unstoppable serves make them capable of anything.

Or do they? A barrage of bombs starting almost ten feet in the air and bouncing over your head doesn’t sound like a fun day on the court, but does it translate into more losses for top players?

The short answer is no. If anything, Karlovic has shown himself far less likely than the average player to perform above or below his ranking. Last August, I created a metric called ‘Upset Score’ designed to measure how often a player wins against a superior opponent or loses to an inferior one. (Player ability is measured by my ranking system, which predicts match outcomes better than ATP rankings and considers surface.) The metric counts extreme upsets more heavily, so Ivo beating David Ferrer is scored as much more meaningful than defeating, say, Stanislas Wawrinka. Of the 87 players who had 40 or more ATP-level matches in the 20-month span I analyzed, Karlovic had the tenth lowest Upset Score.

This flies directly in the face of conventional wisdom. Looking at the current rankings, we find Ivo just below the likes of Santiago Giraldo and Olivier Rochus–neither one of whom would be viewed as a “tricky” third round opponent. Yet both have Upset Scores in the top half of active players. While there’s no doubt Karlovic was once a very dangerous opponent (as his peak ranking of 14 suggests), he has only one top ten scalp in his last twelve tries, dating back to 2009 Wimbledon. We have to go back to the first half of 2007 to find a stretch in which he was a consistent threat to top players.

Isner isn’t as predictable, but delivers fewer upsets than 60% of the guys on tour. Same story as with Ivo: more often than not, he wins and loses according to past performance. Big John has won two of his last fourteen matches against the top ten, and one of those was an ‘upset’ of Nikolay Davydenko, who by this metric is the least predictable man on the tour.

Massive servers may make for more interesting matches–against any opponent, it’s safe to say that Isner and Karlovic are more likely to deliver a tiebreak or four. But if you’re a top player deciding who you’d like to see coming up in your bracket, you probably don’t care whether you win 6-1 or 7-6(8). Whatever the score, Karlovic is best seen as a steady player on the fringes of the top 50, not some loose cannon who will knock out a top seed one day and lose to a qualifier the next.

Graduating From Challengers

The best players don’t take long before they show you how good they are.  Tennis fans are rightfully excited about guys like Bernard Tomic and Milos Raonic, youngsters who have already established themselves at ATP level–if they are this good at 18, or 21, imagine how good they will be.

I’m always looking for ways to quantify that promise.  In the past, I’ve focused on the rankings, noticing that nearly everyone who reached #1 had broken into the top 100 before their 19th birthday.  Another angle is to see how long a player lasts at the challenger level.

The best players seem to skip the challenger level altogether.  It’s a bit like baseball players and Triple-A: some prospects are ready for the big time, so they never play in the highest level of the minor leagues.  Roger Federer only played eight events in his challenger career, Nadal played 12, and Djokovic played 11–out of which he won three titles.  Andy Roddick also won three challenger titles in only six events at that level.

A player can only move so quickly if they gain entry to tour-level events and they take advantage of the opportunities.  Roddick won 20 matches as a wild card in 2001.  Djokovic reached the third round of both Wimbledon and the U.S. Open on his first try.  A few accomplishments like that, plus the points from a couple of challenger titles, and you’re ranked in the top 100, good enough to earn direct entry into most ATP events.  That’s essentially what happened to Milos Raonic after he reached the fourth round in Melbourne last year.

This suggests a new type of filter to separate the prospects from the wannabes.  If someone takes two years to consistently go deep at challenger events and fails to make an impact at the ATP level, they probably aren’t headed for the top 10.  But if someone gets into the top 50 or 60 with only a couple dozen challengers in their past, they just might be something special.

I investigated the challenger careers of everyone currently in the ATP top 100.  Eight of the ten guys who played the fewest challengers are (in order): Roddick, Federer, Juan Carlos Ferrero, Djokovic, Nadal, Gael Monfils, Andy Murray, and Juan Monaco.

The other two? Milos Raonic and Bernard Tomic, who played 16 and 18 challengers, respectively.  Other prospects in the same range are Kei Nishikori (22), Cedrik-Marcel Stebe (25), and Ryan Harrison (28).  While Stebe and Harrison may play a few more, they still haven’t reached the totals of Jo-Wilfried Tsonga (29), Richard Gasquet (32), or David Ferrer (34).  Nikolay Davydenko spent even longer (41 events) on the challenger tour before beginning his ascent to world #3.

More than half of the top 100 played at least 50 challengers, and that’s generally the half you don’t want to be in.  The most promising career trajectory for challenger vets is that of Janko Tipsarevic, who played 89 challengers (winning 10) before putting it all behind him.  Most of the men near him on the list (Tobias Kamke, 88; Andreas Beck, 90; Dudi Sela, 90) can only dream of doing so well.

With a few exceptions like Tipsarevic (and Monaco, who largely skipped the challenger tour but hasn’t become a consistent threat on tour), this is a filter with some potential.  It overlaps quite a bit with age–if you see a 20-year-old in the top 100, he probably hasn’t played nearly as many challengers as a 27-year-old who finally broke in.  Where “number of challengers” might trump age is when comparing players who–for reasons that may not be purely attributable to talent–started playing professionally at much different times.  John Isner, for example, has only played 20 challengers, but didn’t break into the top 100 until he was nearly 23.  His advanced age would have told us he had little potential while hiding the fact he spent years playing college tennis.  The length of his challenger tour career indicates that once he went pro, it wasn’t long before he was ready to play with the big boys.

Whichever metric (age or challenger experience) you prefer, it’s tough to get excited about someone like Alex Bogomolov Jr., who was 28 when he first cracked the top 100, after a career including 151 challengers.  Among the current top 100, only Michael Russell and Ricardo Mello have played more.  Another man with little promise is (I’m sad to say) Flavio Cipolla, 28 years old and #75 in the world.  The Italian has played 136 challengers and won only 51% of his matches in those events.

Another lesson from these numbers is that you can watch a whole lot of challenger-level matches without seeing any real prospects.  (That isn’t to say that Kenny de Schepper versus Michael Yani isn’t entertaining.  It is.)  If future top-tenners play only a handful of challenger events, your average player in a challenger is a guy whose best hope is a peek into the top 50.  Or–if you’re lucky–Janko Tipsarevic.

Living Up to Your Seeding

Listen to the commentary during tennis tournaments and you’ll hear a lot about “living up” or “playing up” to one’s seed.  In other words, a seed implies a certain level of performance. If you’re #10, you should reach the round of 16, but it would take an upset to get to the quarterfinals.

Of course, most players aren’t that consistent.  Sometimes they beat expectations (even Igor Kunitsyn won a tournament) and sometimes they crash out early (hello, Andy Murray!).  While guys like David Ferrer seem to steer a middle course, each player’s ranking is really just a weighted average of the tournaments where they ruled the world and the events where they shouldn’t have gotten out of bed.

And the more you think about it, the more the notion of “living up to your seeding” falls apart.  In order for the top seed at a tournament to meet expectations, he has to win.  That happens considerably less than half the time.  For the second seed to go home happy, he needs to reach the final.  But with rare exceptions, someone who lost in the final every week would quickly amass enough ranking points to be #1.  So at least at the top, we shouldn’t expect that level of consistency.  Also, the whole idea sets the same expectations for the 9th seed as the 16th, the 17th seed as the 32nd.  We can do better.

I looked at the last 20 years of slam results and figured out the average result for every seed.  In that time span, the top seed has won 5.0 matches per slam–on average, then, he has lost in the semifinals.  That number has increased since the majors started seeding 32 players in 2002: In the last 10 years, the top seed has won 5.3 matches per slam, as he has generally coasted through the first two rounds.

Here’s a look at how each seed has done over the last 20 years.  After the top few guys, no one should be expected to reach the quarters–certainly not the #8 seed!

Seed       Wins            
1          5.0   SF        
2          4.2   QF+       
3          3.7   QF-       
4          3.4   R16+      

5          2.7   R16-      
6          2.9   R16-      
7          2.5   R32/R16   
8          2.1   R32+      

9          2.5   R32/R16   
10         2.7   R16-      
11         2.2   R32+      
12         2.6   R16-      

13         2.1   R32+      
14         2.2   R32+      
15         2.1   R32+      
16         1.6   R64/R32   

17-32      1.6   R64/R32   
UNR 92-01  0.7   R64-      
UNR 02-11  0.6   R128/R64

A more sophisticated way of looking at this is with probabilities.  Sure, the smart money is on the top seed winning five matches, but beyond knowing that he wins the tournament between 35 and 40 percent of the time, what are the odds that he reaches the final?  Crashes out early?

Here are those odds for the same sets of players:

Seed         R64    R32    R16     QF     SF      F      W  
1          97.3%  90.5%  83.8%  75.7%  62.2%  48.6%  36.5%  
2          88.5%  78.2%  70.5%  60.3%  51.3%  34.6%  24.4%  
3          93.5%  80.5%  70.1%  57.1%  36.4%  19.5%   5.2%  
4          84.4%  75.3%  64.9%  55.8%  39.0%  14.3%   7.8%  

5          84.2%  71.1%  47.4%  36.8%  15.8%   7.9%   2.6%  
6          84.2%  67.1%  56.6%  38.2%  21.1%  13.2%   7.9%  
7          81.3%  69.3%  52.0%  32.0%  16.0%   4.0%   0.0%  
8          80.3%  61.8%  47.4%  22.4%   2.6%   1.3%   0.0%  

9          86.3%  70.0%  53.8%  28.8%  13.8%   5.0%   0.0%  
10         88.2%  69.7%  52.6%  31.6%  10.5%   5.3%   2.6%  
11         93.2%  63.0%  34.2%  15.1%   4.1%   1.4%   0.0%  
12         84.8%  70.9%  51.9%  34.2%  19.0%   5.1%   2.5%  

13         79.5%  61.5%  48.7%  12.8%   7.7%   3.8%   2.6%  
14         82.7%  60.0%  42.7%  18.7%   9.3%   2.7%   0.0%  
15         81.8%  67.5%  41.6%  15.6%   7.8%   3.9%   0.0%  
16         72.7%  44.2%  28.6%   7.8%   5.2%   2.6%   1.3%  

17-32      72.5%  51.8%  19.7%   8.2%   2.2%   0.9%   0.4%  
UNR 92-01  42.6%  15.8%   5.7%   1.9%   0.6%   0.2%   0.0%  
UNR 02-11  40.1%  12.8%   4.3%   1.2%   0.4%   0.2%   0.0%

The same sample of no more than 80 slams means that these numbers don’t give us a smooth curve, but they still provide a pretty good idea.  In fact, they look awfully similar to my pre-tournament slam predictions, with the exception of the big gap between the top two seeds and the rest of the field.

What Happens When You Win an Aussie Warmup?

Italian translation at settesei.it

Because of its placement on the calendar, the Australian Open is unique.  It almost immediately follows the offseason (such as it is), so the common perception is that some players show up less ready than for the other three slams.

For this reason, the tournaments in the two weeks before the Australian Open are both important and difficult to predict.  At Chennai next week, who will be in shape? Who is mentally ready for the new season?  And once we get the results from Chennai, Doha, Auckland, Sydney, and Brisbane, what does that tell us about the Aussie Open itself?

It’s this last question that I’ll try to answer today.  If there’s ever a time that rankings don’t seem to count for quite as much, it’s January–after all, that’s when Yevgeny Kafelnikov won his hard-court slam.  It would stand to reason if the warmups were particularly predictive.  Perhaps tourneys like Doha serve as sneak previews of each player’s readiness for the big event in Melbourne.

Alas, it doesn’t look that way.  Winning a tournament in the two weeks before Melbourne doesn’t predict better performance at the Australian Open.  In fact, it more reliably forecasts a disappointing showing at the first grand slam of the year.

Since 1992 (and not counting 2007, when some of the warmups tinkered with a round-robin format), there have been 93 tournaments in the two weeks before Melbourne.  42 of those were the week before the slam, and 51 were two weeks before the slam.  For each one, I noted the winner of the event, their seeding in Melbourne, and their performance in Melbourne.  With the last two data points, we can determine whether each player performed equal to, above, or below expectations.

(Aussie Open seeding isn’t a perfect way to determine expectations, since results from two weeks before are reflected in the rankings.   But it was much easier than any alternative, and since this approach doesn’t recognize a difference between, say, the 5th seed and the 8th seed, I doubt it makes much difference.)

Let’s start with winners the week before Melbourne.  I didn’t expect much here, since the best players tend to take a week off before slams.  It seems, though, that a win the week before at least helps you through the first round or two.

Of the 42 champions of week-before tourneys, 12 met expectations (that is, played as their Aussie Open seeding would have predicted), 17 exceeded expectations, and 13 didn’t meet expectations (including one who withdrew from the slam).  Of the last group, only four players lost their opening round in Melbourne, and none of those players were seeded.  Several week-before winners lost in the second round; the most painful of those was 6th-seed Michael Chang’s exit in 1993.

On the flip side, Pete Sampras played Sydney and won in 1994, then went straight to Melbourne, where he made it two trophies in a row.  He is the only player in the last 20 years to have won the Australian in addition to an event the previous week.

For champions two weeks before Melbourne, the results aren’t as pleasant.  Of those 51 tournament winners, 15 met expectations at the slam, 12 exceeded them, and 24 failed to play up to their seed (again, including one who withdrew from the Open).

A whopping 14 of those 51 champions didn’t win a single match in Melbourne, including 4-seed Boris Becker in 1993, 5-seed Carlos Moya in 2005, and 9-seed Andy Murray in 2008.  Only two of the 51 players won the tournament: Petr Korda in 1998 and Roger Federer in 2006, both of whom won Doha in their respective years.

In other words, winning a warmup doesn’t say much about your form for the Open itself–in fact, next week’s winners won’t deserve much additional hype, no matter how good they look in their season debuts.

The question I haven’t answered is: What if you skip warmups altogether?  With the exception of exhibitions, that’s what Novak Djokovic is doing this year, along with several others.  Most notable from the list: Marin Cilic, who won in Chennai two years ago.  After that performance, he failed to get past the round of 16 in Melbourne.  Maybe this year, fresher legs will translate into a deeper run.

Grand Slam Forecasting for Dummies

It’s one thing to predict a winner–it’s another thing to quantify how likely a player is to become that winner.

In most tennis tournaments, it’s not hard to pick a favorite.  For most of the last year, it was Novak Djokovic, no matter the surface or who he might face.  Before that, it was Federer on hard courts, Nadal on clay courts.  While every one likes to identify a dark horse, there’s rarely much debate at the top.

Given that agreement, though, what odds would you have placed on Novak Djokovic winning Wimbledon?  Or the French?  Or an in-form Federer winning the tour finals over an injured Djokovic and a tired Nadal?  Usually, my numbers spit out something between 20 and 30 percent–in theory, even the best player in the tournament has a better than two-thirds chance of going home a loser.

Intuitively, this is difficult to believe.  Djokovic seemed so dominant for much of the year that his slam victories felt like foregone conclusions.  Anyone who watched Novak on a good day found it impossible to imagine anyone outplaying him.  When Carl Bialik wrote a column asking whether Djokovic could keep up his dominance for the entire season, most responses were some variation of “What are you, stupid? Numbers are irrelevant when someone is so good.”

But, all good things must come to end, and a combination of injuries and good opponents proved that even Djokovic is human.

That said, Djokovic’s dominance–and Nadal’s before him, and Federer’s before him–raises questions about forecasting tennis matches.   The questions are complicated, but rest easy: today’s attempt at an answer will be simple.

Do the rules apply to the very best?

My ranking and forecasting system starts by assigning a number to every player, not unlike ATP ranking points.  To keep things simple, let’s use ranking points.  If we want to predict the outcome of, say, Mardy Fish against Feliciano Lopez, we take their point totals (2965 and 1755) and divide one by the sum of the others: 2965/(2965+1755) = 62.8%.  (It’s a little more complicated than that, but not much.)  Setting aside concerns like home court advantage and surface, that sounds about right to me.

Do the same with Djokovic and Lopez, and you get 88.6%.  Work the numbers with Djokovic and world #100 Michael Berrer, and you get 96.0%.  That’s pretty dominant, suggesting that Berrer would win only 1 in 25 matchups, but wait a minute–we’re saying Berrer’s going to beat Djokovic, ever?

And therein lies the problem.  The formulas I use to generate points and generate predictions are reasonably accurate, tested against years of ATP results.  And in the aggregate, individual match percentages pass the smell test.  But at the extremes, the numbers seem questionable.

And it is at the extremes where the exact percentages matter the most.  Consider my pre-tournament predictions for Wimbledon this year.  While Nadal was the top seed, I picked Djokovic as the favorite, giving him a 21.6% chance of winning.  But look at those first few rounds: I gave him only an 87% chance of getting past Jeremy Chardy (Jeremy Chardy!) in the first round, then only an 88% chance of beating Kevin Anderson or Ilya Marchenko, then only an 85% chance of winning against (probably) Marcos Baghdatis.

Only the last of those three numbers is plausible.  And when combined, they meant that I gave Djokovic less than a 65% chance of reaching the round of 16.  With all due respect to myself, that was almost as ridiculous then as it it sounds now.

It’s those early-round numbers that result in such minute chances that the favorite will win the tournament.  Even if we give a player a 90% chance of winning all his matches, he’ll still only win the seven consecutive matches required for a grand slam 48% of the time.  Lower it to 80%, and we’re down to 21% for the tournament.  Since the odds of winning a semifinal match against the likes of Murray, Federer, or Nadal is probably much lower, it seems that early round odds should be much more favorable.

To summarize, one of two things is going on here.  Either (1) my numbers underestimate the likelihood that the pre-tournament favorite wins a grand slam; or (2) our intuition overestimates the likelihood that the favorite takes home the trophy.

Forecasting for dummies

One way to pick between the two is to look at the recent past.  Are pre-tournament favorites winning more or less than expected?

For now, let’s set aside the question of the likelihood that Djokovic beats Chardy or Marchenko, and look only at winning the tournament.  We’re going to make two major assumptions here: (1) it’s possible to identify the pre-tournament favorite years later, and (2) favorites are generally created equal–Djokovic towers over his competitors to the same degree that Courier, or Lendl, or Sampras, or Federer towered over his.  As usual, both of these assumptions probably aren’t true, but they aren’t so hideously wrong that they’ll stop us from reaching some worthwhile conclusions.

There are three easy ways of picking the pre-tournament favorite for a grand slam: using (a) the winner of the last slam; (b) the defending champion, and (c) the top seed–almost always the world #1.  The top seed is probably best, while the defending champion might identify a player who is particularly good on the surface, and the winner of the last slam might pick out someone who is riding a hot streak.

The last 21 years (back to 1991, inclusive), give us 84 slams to work with.  Our sample is a bit smaller than that, because occasionally the winner of the last slam or the defending champion did not play, and on three occasions, the top seed pulled out before the tournament began.  Here is how the favorites did:

  • Of the 75 players who had won the previous slam, 18 (24%) won the tournament.
  • Of the 76 defending champions, 26 (34%) won the tournament.
  • Of the 81 top seeds, 29 (36%) won the tournament.  If we exclude the French (where the top seed is often #1 on the basis of hard court performance), we get a more dramatic result here–26 of 60 (43.3%) won the tournament.

All of these measures are much higher than the 21.6% shot I gave Djokovic at Wimbledon.  And most are higher than the 27-28% chances I gave him at the French and US Open.  The 43.3% likelihood that the top seed wins a hard-court slam (thank you, Pete and Roger!) suggests that a more sophisticated measure of identifying the favorite might allow us to predict slam champions with, say, 40% accuracy.

40% is considerably higher than my models are spitting out right now, but I suspect it is much lower than many fans imagine for their favorite.  It suggests that, at the extremes, my predictions aren’t quite one-sided enough.  It might take Michael Berrer more than 25 chances before he finally catches Djokovic on a bad day.

Point-by-Point Profile: Jo-Wilfried Tsonga

Continuing with our point-by-point player profiles, let’s look at Jo-Wilfried Tsonga. If anyone from outside can break into the big four, he’s got to be on the short list after his big finish to the season.

Using all of his grand slam matches from 2011, we can begin to analyzes his tendencies on serve and return.

The first table shows the frequency of different outcomes in the deuce court, in the ad court, and on break point, relative to Tsonga’s average. For instance, the 0.974 in the upper left corner means that Tsonga wins 2.6% fewer points than average in the deuce court.

OUTCOME       Deuce     Ad  Break  
Point%        0.974  1.028  0.931  
                                   
Aces          1.005  0.994  0.718  
Svc Wnr       1.013  0.985  0.826  
Dbl Faults    1.082  0.909  0.758  
1st Sv In     1.017  0.981  0.955  
                                   
Server Wnr    0.884  1.127  0.986  
Server UE     1.030  0.967  1.140  
                                   
Return Wnr    1.054  0.941  1.567  
Returner Wnr  1.073  0.920  1.136  
Returner UE   0.887  1.124  1.026  
                                   
Rally Len     0.992  1.009  1.036  

Unlike all of the right-handers we’ve looked at so far, Tsonga wins more points in the ad court. He doesn’t win quite as many cheap points, as he hits a few more aces and service winners in the deuce court. But the end result is what matters, and it seems that he sets up the point better in the ad court, as shown by his high rate of winners in the rally, and his avoidance of return winners at any stage of the point.

Unfortunately for Jo-Willy, his success in the ad court doesn’t always transfer to break points. Winning 7% fewer service points than average on break points isn’t bad, but given the inherent advantage of his ad-court tendencies, it seems within reach for him to fight off a few more break points.

Next, this is how he performs on a point-by-point basis. Win% shows what percentage of points he wins at that score; Exp is how many he would be expected to win (given how he performs in each match), and Rate is the difference between the two. A rate above 1 means he plays better on those points; below 1 is worse.

SCORE   Pts   Win%    Exp  Rate  
g0-0    302  63.6%  66.6%  0.95  
g0-15   108  64.8%  65.3%  0.99  
g0-30    38  68.4%  62.9%  1.09  
g0-40    12  75.0%  58.8%  1.27  
                                 
g15-0   188  66.5%  67.4%  0.99  
g15-15  133  60.9%  66.2%  0.92  
g15-30   78  71.8%  65.7%  1.09  
g15-40   31  54.8%  63.2%  0.87  
                                 
g30-0   125  63.2%  68.2%  0.93  
g30-15  127  66.9%  66.5%  1.01  
g30-30   98  73.5%  65.5%  1.12  
g30-40   43  58.1%  62.9%  0.92  
                                 
g40-0    79  74.7%  68.8%  1.08  
g40-15  105  66.7%  67.7%  0.98  
g40-30  107  72.0%  66.8%  1.08  
g40-40  108  65.7%  65.7%  1.00  
                                 
g40-AD   37  75.7%  64.6%  1.17  
gAD-40   71  64.8%  66.3%  0.98  

As with so many other players, there is a gap in performance between logically equivalent points. 30-40 and 40-AD should be about the same; the only difference is that returners might be a little better at 30-40, having won 60% of points instead of 57% to get to the first 40-AD point. But while Tsonga dominated at 40-AD (admittedly with only 37 such points to draw on), one of his weakest points was 30-40. There’s a similar gap between 30-30 (another of his best) and 40-40 (precisely average).

Serving Against Tsonga

We can go through the same exercises for Tsonga’s return points. The next two tables are trickier to read. Look at them as Serving against Tsonga. Thus, the number in the upper-left corner means that when serving against him, players win 3.3% more points than average in the deuce court; he is a better returner in the ad court. That’s partly attributable to the fact that righties serve better in the deuce court, but while JW’s tendencies aren’t quite as extreme as David Ferrer’s, they are more than we would expect.

(I’ve excluded return points against lefty servers. Since lefties and righties have such different serving tendencies, limiting the sample to righty servers gives us clearer results, even as the sample shrinks a bit.)

OUTCOME       Deuce     Ad  Break  
Point%        1.033  0.963  0.896  
                                   
Aces          1.215  0.756  0.918  
Svc Wnr       1.079  0.911  0.809  
Dbl Faults    0.918  1.093  0.547  
1st Sv In     1.014  0.985  0.997  
                                   
Server Wnr    0.968  1.036  0.930  
Server UE     0.916  1.094  0.948  
                                   
Return Wnr    0.740  1.294  0.381  
Returner Wnr  0.892  1.123  1.466  
Returner UE   0.979  1.023  0.370  
                                   
Rally Len     0.971  1.033  1.136  

By just about every measure, Tsonga is a better returner in the ad court. He prevents aces and service winners at a high rate, hits plenty of winners at every stage of the point, and forces his opponent to try for more, leading to more double faults. That success follows him onto break points, where he is more conservative (very few return winners or unforced errors) but wins 10% more points than average.

Here’s more on Tsonga’s return game, again with numbers from the perspective of players serving against him.

SCORE   Pts   Win%    Exp  Rate  
g0-0    297  67.0%  65.1%  1.03  
g0-15    97  58.8%  64.6%  0.91  
g0-30    40  62.5%  64.4%  0.97  
g0-40    15  60.0%  64.1%  0.94  
                                 
g15-0   194  66.0%  65.4%  1.01  
g15-15  123  70.7%  64.4%  1.10  
g15-30   61  42.6%  63.6%  0.67  
g15-40   44  45.5%  63.5%  0.72  
                                 
g30-0   128  68.8%  66.0%  1.04  
g30-15  127  65.4%  65.0%  1.01  
g30-30   70  64.3%  64.2%  1.00  
g30-40   45  55.6%  63.8%  0.87  
                                 
g40-0    88  69.3%  66.6%  1.04  
g40-15  110  73.6%  65.4%  1.13  
g40-30   74  66.2%  64.5%  1.03  
g40-40   82  65.9%  63.8%  1.03  
                                 
g40-AD   28  53.6%  65.0%  0.82  
gAD-40   54  68.5%  63.2%  1.08  

Unfortunately, the sample sizes are getting a little small–Tsonga didn’t play as many grand slam matches as the big four, so it’s tough to do much analysis here. There is some evidence that he dominates more than expected once he gets ahead of the server, as seen in the rates at 15-30, 15-40, 30-40, and 40-AD. Tsonga seems to be a streaky player–anyone capable of reeling off several consecutive games against Federer on a hard court would need to be–and these numbers support that, at least in his return game.

Tsonga wraps up our point-by-point profiles. Because we only have point-by-point data for the grand slams, there just isn’t enough information to work with for players outside of the top six.

Point-by-Point Profile: David Ferrer

Continuing with our point-by-point player profiles, let’s look at David Ferrer. He is firmly on the outside of the big four, but remains a threat, especially on clay.

Using all of his grand slam matches from 2011, we can begin to analyzes his tendencies on serve and return.

The first table shows the frequency of different outcomes in the deuce court, in the ad court, and on break point, relative to Ferrer’s average. For instance, the 1.014 in the upper left corner means that Ferrer wins 1.2% more points than average in the deuce court.

OUTCOME       Deuce     Ad  Break  
Point%        1.012  0.986  0.914  
                                   
Aces          1.018  0.980  0.940  
Svc Wnr       1.082  0.909  0.899  
Dbl Faults    0.993  1.008  0.256  
1st Sv In     0.991  1.010  0.983  
                                   
Server Wnr    0.945  1.061  0.855  
Server UE     0.988  1.013  1.012  
                                   
Return Wnr    0.909  1.102  0.490  
Returner Wnr  0.956  1.048  1.458  
Returner UE   0.938  1.069  0.898  
                                   
Rally Len     0.960  1.044  1.031  

Of all the players we’ve looked at so far, Ferrer has the smallest differences between serving in the deuce and ad courts. Double faults and first serve rate are almost exactly even. He also seems to have figured out how to guarantee a rally at break point, with virtually no double faults and almost as few return winners. It doesn’t translate into an impressive number of break points won, though.

Next, this is how he performs on a point-by-point basis. Win% shows what percentage of points he wins at that score; Exp is how many he would be expected to win (given how he performs in each match), and Rate is the difference between the two. A rate above 1 means he plays better on those points; below 1 is worse.

SCORE   Pts   Win%    Exp  Rate  
g0-0    279  72.0%  68.6%  1.05  
g0-15    76  57.9%  67.7%  0.86  
g0-30    32  50.0%  66.1%  0.76  
g0-40    16  50.0%  64.0%  0.78  
                                 
g15-0   200  77.0%  69.0%  1.12  
g15-15   90  68.9%  68.6%  1.00  
g15-30   44  65.9%  66.6%  0.99  
g15-40   23  65.2%  65.9%  0.99  
                                 
g30-0   154  66.9%  69.2%  0.97  
g30-15  113  68.1%  69.2%  0.98  
g30-30   65  67.7%  67.0%  1.01  
g30-40   36  66.7%  66.3%  1.01  
                                 
g40-0   103  67.0%  69.7%  0.96  
g40-15  111  69.4%  69.3%  1.00  
g40-30   78  61.5%  67.8%  0.91  
g40-40   98  69.4%  65.2%  1.06  
                                 
g40-AD   30  60.0%  64.0%  0.94  
gAD-40   68  61.8%  65.7%  0.94  

The sample sizes are small, but it’s still distressing to see Ferrer’s performance at 0-15, 0-30, and 0-40. Anecdotally, it seems that when shorter players don’t have their serve working for them, they can get broken in a hurry. Beyond that, there aren’t a lot of strong tendencies here; I’m sure Ferrer would like to win a few more points at AD-40, but that’s about all.

Serving Against Ferrer

We can go through the same exercises for Ferrer’s return points. The next two tables are trickier to read. Look at them as Serving against Ferrer. Thus, the number in the upper-left corner means that when serving against him, players win 4.7% more points than average in the deuce court; he is a better returner in the ad court. That’s partly attributable to the fact that righties serve better in the deuce court, but Ferrer’s tendencies are considerably more pronounced.

(I’ve excluded return points against lefty servers. Since lefties and righties have such different serving tendencies, limiting the sample to righty servers gives us clearer results, even as the sample shrinks a bit.)

OUTCOME       Deuce     Ad  Break  
Point%        1.047  0.948  0.910  
                                   
Aces          0.964  1.039  0.244  
Svc Wnr       1.102  0.888  0.762  
Dbl Faults    0.799  1.221  1.172  
1st Sv In     1.040  0.956  1.004  
                                   
Server Wnr    1.017  0.982  0.802  
Server UE     0.877  1.135  1.260  
                                   
Return Wnr    1.328  0.639  0.701  
Returner Wnr  1.084  0.908  1.029  
Returner UE   1.074  0.918  0.945  
                                   
Rally Len     0.959  1.046  1.168 

These are some confusing numbers. Ferrer wins more points in the ad court, more than would be expected against right-handed servers. It appears that his opponents know he is more dangerous returning in the ad court; they go for more on the first serve, double-faulting more oftne and landing fewer first serves. But Ferrer hits far more winners, both on the return and later in the point, in the deuce court. It may be that Ferrer’s ad-court return is good enough to set up the point in his favor, but rarely good enough to push the point to a quick conclusion.

Also of note is Ferrer’s returning on break point. Maybe it’s just a fluke; reducing aces to one-quarter of their usual rate is remarkable.

Here’s more on Ferrer’s return game, again with numbers from the perspective of players serving against him.

SCORE   Pts   Win%    Exp  Rate  
g0-0    273  58.6%  58.3%  1.01  
g0-15   113  59.3%  57.5%  1.03  
g0-30    46  56.5%  55.5%  1.02  
g0-40    20  65.0%  56.0%  1.16  
                                 
g15-0   158  53.8%  58.7%  0.92  
g15-15  140  63.6%  57.7%  1.10  
g15-30   77  50.6%  56.4%  0.90  
g15-40   51  54.9%  55.0%  1.00  
                                 
g30-0    85  63.5%  60.5%  1.05  
g30-15  120  61.7%  58.0%  1.06  
g30-30   85  52.9%  57.3%  0.92  
g30-40   68  51.5%  56.8%  0.91  
                                 
g40-0    54  59.3%  62.0%  0.96  
g40-15   96  64.6%  59.2%  1.09  
g40-30   79  57.0%  57.8%  0.99  
g40-40  143  58.0%  55.7%  1.04  
                                 
g40-AD   60  53.3%  54.9%  0.97  
gAD-40   83  49.4%  56.3%  0.88  

Unlike in his service game, Ferrer is more successful than expected at 40-AD and AD-40, winning more than half of return points at AD-40. He also excels at 15-30, 30-30, and 30-40, suggesting that he may be a bit streaky, returning well when he works himself into a hard-fought game.