Is the US Open Draw Truly Random?

Italian translation at settesei.it

Last week, an ESPN “Outside the Lines” article called into question the fairness of the U.S. Open main draw.  A researcher discovered that the top two seeds (both men and women) have gotten very easy first-round assignments.

This is one small step away from a direct accusation of draw-rigging by the USTA.  It’s a serious claim, and while the article’s author leans heavily on a single academic who supports the methodology used, it’s not at all clear that anything unacceptable is going on.

What they found

For some reason, the study focused on the top two seeds.  It’s not at all clear why it did so–I have no idea what the USTA’s motive would be for rigging the draw in favor of the top two seeds, regardless of their identity.  Sure, there were a few years when a Federer-Nadal final would have been particularly mouthwatering, or when American viewers craved a Serena-Venus showdown in Flushing, but why would the USTA be tweaking a draw in favor of Gustavo Kuerten?  Marat Safin?  Amelie Mauresmo? Dinara Safina?

For the moment, let’s set that major concern aside.  To quantify the difficulty of each player’s first-round opponents, the ESPN study invented a metric called “difficulty score.”  We’ll come back to “difficulty score” in a bit.

A simple look at the lists they assembled of first-round opponents does suggest that something untoward is going on.  In the last ten years of men’s draws, a top-two seed has faced a top-80 opponent only four times, and not once in the last five years.  Seeded players should face top-80 opponents about half the time.

If we are truly interested in the first-rounders assigned to top-two seeds, it’s clear that these players have been given an easier path than what would be statistically expected.  But it’s not yet clear that it’s anything other than good luck.

Breaking down “difficulty score”

Here’s the explanation of the metric that ESPN used:

So if a top two seed faced the 33rd-ranked player in the first round, he/she would get a difficulty score of 0.995 for that round; if he/she faced the 128th-ranked player in the first round, the score for that round would be 0.005. An average opponent (ranked around 80th or 81st), would correspond to a difficulty score near 0.500, which should be the average difficulty score over several years of draws.

I don’t understand why the ESPN study needed to switch from ordinal rankings (1 to 128) to difficulty scores between 0.005 and 0.995.  But I replicated the work using ordinal rankings instead of difficulty scores, and came up with the same results.

The average first round opponent for the top two seeds in each year’s men’s draw has been about the 98th-best player in the draw.  Given that seeds can draw anyone from 33 to 128, the average “should” be around 80.  With difficulty scores, ESPN says that the likelihood of the last ten years of easy draws is 0.3%.  With ordinal rankings, I found approximately the same.  The last thing the sports-analysis world needs is another superfluous metric, but at least this one doesn’t appear to be misleading.

What about better reasons for rigging?

The core problem here is this: Why do we care  specifically about the draws for the first two seeds?  Or, why would the USTA care enough to compromise the fairness of the draw?

As ESPN highlighted, some of the first-round victims are American wild cards.  Scoville Jenkins, for instance, was fed to the wolves twice, once each against Federer and Roddick.  If we’re really fishing for an explanation, perhaps the USTA wants to put up-and-coming stars such as Jenkins, Devin Britton, and Coco Vandeweghe on a big stage, either to showcase these players, or to make otherwise pedestrian blowouts more interesting.  I suppose I’d rather watch Nadal play Jack Sock than, say, Diego Junqueira.

But that’s ex post facto reasoning of the most blatant sort.  If the USTA were going to rig the draw, wouldn’t they be more likely to do so in favor of top Americans?  Or in favor of a broader range of seeds, to better ensure marquee matchups for the second week?  Or rig second-round matchups for top players, to ensure that the big names make it to the middle weekend?

If no evidence of draw manipulation appears in any of those other scenarios, it would seem that ESPN discovered something more like the famous correlation between the S&P 500 and butter production in Bangladesh.  If your search for a newsworthy conclusion is sufficiently wide, you’re bound to find something.

The top seeds

As I’ve said, there’s no doubt that the top two seeds in the men’s draw have had an easy go of it in the last ten years, since the draws started seeding 32 players instead of 16.  The same is true of the women.

The top two in both the men’s and women’s draws faced an opponent who ranked roughly 98th out of the 128 field.  The odds of this happening on either side are tiny–about 0.25%.  The chances that a single tournament would randomly produce draws so easy for the top two men and women for ten years are effectively zero.

Beyond the top two, however, any suspicions quickly disappear.  The average opponent for the top four seeded men has been ranked about 89 out of 128, meaning that #3 and #4 face opponents around #80–dead average.  The average first-round assignment for the top eight seeded men has been around 87, meaning that seeds 5-8 face average opponents in the mid-80s.  Nothing to cause a raised eyebrow there, and the numbers are almost identical on the women’s side.

To go one step further, there’s no evidence of manipulation in the second-round draws.  In fact, the top two women’s seeds faced particularly tough 2nd round opponents–there was only a 20% chance that those twenty women would be given as tough of 2nd round assignments as they have.

Before looking at the draws of U.S. players, a quick summary.  While the top two seeds were given very low-ranked opponents in the first round, the effect did not extend to the second round, or to any seeds beyond the top two.

The American draws

If the USTA were to tweak the draws, you’d expect them to do so in favor of the home players, if for no other reason than television ratings.  But they haven’t.

Let’s start with the American men.  The top two ranked American men each year have faced opponents ranked, on average, 79 of 128.  That’s a bit tougher than average.  If we expand the analysis to the top four ranked Americans, or just seeded Americans, the results stay around average.  If anyone is manipulating the draws in favor of American men, they are either doing it without regard for ATP rankings, or they aren’t doing a very good job.

More surprising is the average opponent of all American men.  The average opponent of an American man in the last ten years has been 61.2 — considerably lower than 80, in part because unseeded men may draw seeded players in the first round.  But the average shouldn’t be that low.  In fact, there is only a 20% chance that American men would be given such a tough assignment.

Results for the women are mostly similar.  The top two American women each year have gotten a slightly easy draw–the average opponent rank is 83 of 128.  Keep in mind, however, that this overlaps with the analysis of the top two seeded women–five of the 20 top-two-seeded women were Americans, and in almost each one of those five cases, those women faced one of the weakest players in the draw.  In other words, there’s more evidence that the draw is skewed in favor of the top two seeds than the top two Americans.

As with the men, American women in general have been given tough assignments.  In fact, there is only a 16% chance that American women would face such tough first round opponents as they have.

What this means

If the USTA (or anyone else) is messing with the US Open draws, they are doing so in a nearly inscrutable way.  The only evidence of manipulation is with each year’s top two seeds, as ESPN highlighted.

The theory I mentioned above–that it might be desirable to pit top players against up-and-coming Americans–is appealing, but also not supported by the evidence.  Only five of the 20 opponents of top-two men’s seeds (and six of 20 women’s opponents) has been American, despite the fact that the U.S. contributes five or six lowly-ranked wild cards each year, in addition to a disproportionate number of qualifiers.

It’s an odd situation.  The first-round opponents of the top two seeds makes for a plausible target of draw manipulation, if not the most obvious one.

Postscript: One more question

I mentioned earlier that I’d rather watch Nadal play Jack Sock than Diego Junqueira.  I like up-and-comers, and it’s always interesting to see whether a new opponent forces a top player to change tactics.  It makes for a more interesting match than Nadal (or any top-tenner) against a 29-year-old who has hovered for years around #100.

My question, then: If you’re Rafa Nadal, and (presumably) you want to go deep at the U.S. Open, who would you rather play?  The American wild card ranked #450, or the veteran ranked #99?  A tougher question: Sock, or a veteran who was nearly seeded, like Fabio Fognini?  I can see different players making different choices, but I don’t think it’s clear cut.

It is the draws of Jenkins, Britton, Glatch–in other words, the Jack Socks of previous years–that give us this evidence of manipulation.  On paper, the 127th-highest-ranked player in the draw looks like the 127th-best, but in practice, it’s not nearly so clear cut.  And if these wild cards really are “wild cards,” what looks like an easy draw may not be much easier than yet another dissection of Sergiy Stakhovsky or Albert Montanes.

It may be true that at some stage, the US Open draws are being manipulated for (and only for) the top two seeds in each field.  But that doesn’t tell us whether those players are gaining anything from it.  It’s far from clear that the lowest-ranked players in each draw are the easiest opponents.

US Open Qualifiers Guide

Tomorrow, 128 men will begin battling for the last 16 spots in the US Open main draw.  (Technically, given withdrawals, it’ll probably be more like 18 or 19 spots, but as of now, it’s 16.)

It’s my favorite time of year, and if you live near New York, it should be yours as well.  But unless you’re an extreme tennis nut, most of the names aren’t very familiar.

Click here for a quick “guide” to the 128 contenders, including their seed, country of origin, birthday, current ATP ranking, and current hard-court “jrank”–that is, their standing in my ranking system.  (The table is too wide to display well on this site.)

If you want to play with the guide, feel free to download it in CSV format.

Hard Court Singles Rankings: 22 August 2011

With the U.S. Open a mere seven days away (and qualifying starting tomorrow!), it’s time to update my hard-court singles rankings.  If you’re interested in some of the methodology underlying these rankings, start here.

Here’s the top 101.  For what might be the first time since I started publishing these, Delpo is knocked out of the top four.  Because my system takes into account the last two years, he could take a hit when the 2009 US Open comes off the books.  It’s not as major a shift as in the ATP rankings, because my system has already heavily discounted the 2009 Open because it was so long ago, but given how large a factor those wins play in Delpo’s ranking, it will make a difference.

Also interesting to see how my system reflects the mess that is 6 through 15.  Fish, appropriately, heads the group on hard courts, while Ferrer loses several spots compared to the ATP rankings.  (Remember, these numbers are hard-court specific.)  Melzer and Almagro find themselves way out of the running.

Note also what these numbers do with some younger players — Bernard Tomic is on the cusp of cracking the top 20, and Ryan Harrison is inside the top 50.

RANK  PLAYER                  POINTS  
1     Novak Djokovic            7509  
2     Rafael Nadal              4977  
3     Roger Federer             4154  
4     Andy Murray               3911  
5     Juan Martin del Potro     3207  
6     Mardy Fish                2709  
7     Jo-Wilfried Tsonga        2654  
8     Robin Soderling           2360  
9     Tomas Berdych             2034  
10    Stanislas Wawrinka        1907  
11    Gael Monfils              1842  
12    Marin Cilic               1790  
13    David Ferrer              1601  
14    Andy Roddick              1518  
15    Gilles Simon              1507  
16    Nikolay Davydenko         1422  
17    Marcos Baghdatis          1392  
18    Richard Gasquet           1339  
19    Fernando Verdasco         1321  
20    David Nalbandian          1279  

RANK  PLAYER                  POINTS  
21    Bernard Tomic             1279  
22    Milos Raonic              1267  
23    Ernests Gulbis            1256  
24    Janko Tipsarevic          1159  
25    Viktor Troicki            1143  
26    Mikhail Youzhny           1108  
27    Florian Mayer             1093  
28    Alexander Dolgopolov      1068  
29    Philipp Kohlschreiber     1061  
30    Jurgen Melzer             1045  
31    Samuel Querrey            1044  
32    Nicolas Almagro           1023  
33    Ivan Ljubicic             1011  
34    Kei Nishikori             1005  
35    John Isner                 982  
36    Ivan Dodig                 948  
37    Michael Llodra             921  
38    Feliciano Lopez            903  
39    Radek Stepanek             896  
40    Guillermo Garcia-Lopez     854  

RANK  PLAYER                  POINTS  
41    Kevin Anderson             751  
42    Jeremy Chardy              745  
43    Juan Monaco                745  
44    Dmitry Tursunov            740  
45    Philipp Petzschner         736  
46    Ryan Harrison              736  
47    Julien Benneteau           734  
48    Marcel Granollers          720  
49    Tommy Robredo              716  
50    Adrian Mannarino           709  
51    Robin Haase                664  
52    Alex Bogomolov             662  
53    Xavier Malisse             660  
54    Thomaz Bellucci            651  
55    Lleyton Hewitt             621  
56    Sergey Stakhovsky          613  
57    Ivo Karlovic               607  
58    Grigor Dimitrov            602  
59    Thiemo de Bakker           598  
60    Andrei Goloubev            596  

RANK  PLAYER                  POINTS  
61    Lukasz Kubot               592  
62    Olivier Rochus             586  
63    Donald Young               585  
64    Dudi Sela                  559  
65    Santiago Giraldo           554  
66    Mikhail Kukushkin          543  
67    Andreas Seppi              541  
68    Denis Istomin              541  
69    Igor Andreev               528  
70    Pablo Cuevas               521  
71    Fabio Fognini              512  
72    James Ward                 505  
73    Yen-Hsun Lu                500  
74    James Blake                488  
75    Richard Berankis           477  
76    Matthias Bachinger         474  
77    Albert Montanes            468  
78    Lukas Lacko                466  
79    Benjamin Becker            466  
80    Jarkko Nieminen            463  

RANK  PLAYER                  POINTS  
81    Ryan Sweeting              461  
82    Leonardo Mayer             458  
83    Somdev K. Dev Varman       454  
84    Jerzy Janowicz             444  
85    Daniel Brands              444  
86    Matt Ebden                 440  
87    Michael Zverev             437  
88    Tobias Kamke               429  
89    Evgueni Korolev            426  
90    Blaz Kavcic                421  
91    Michael Berrer             419  
92    Daniel Gimeno              416  
93    Vladimir Ignatik           416  
94    Edouard Roger-Vasselin     412  
95    Frank Dancevic             406  
96    Alejandro Falla            401  
97    Ilia Marchenko             399  
98    Gilles Muller              396  
99    Grega Zemlja               396  
100   Simone Bolelli             387  
101   Wayne Odesnik              386