So, About Those Stale Rankings

Both the ATP and WTA have adjusted their official rankings algorithms because of the pandemic. Because many events were cancelled last year (and at least a few more are getting canned this year), and because the tours don’t want to overly penalize players for limiting their travel, they have adopted what is essentially a two-year ranking system. For today’s purposes, the details don’t really matter–the point is that the rankings are based on a longer time frame than usual.

The adjustment is good for people like Roger Federer, who missed 14 months and is still ranked #6. Same for Ashleigh Barty, who didn’t play for 11 months yet returned to action in Australia as the top seed at a major. It’s bad for young players and others who have won a lot of matches lately. Their victories still result in rankings improvements, but they’re stuck behind a lot of players who haven’t done much lately.

The tweaked algorithms reflect the dual purposes of the ranking system. On the one hand, they aim to list the best players, in order. On the other hand, they try to maintain other kinds of “fairness” and serve the purposes of the tours and certain events. The ATP and WTA computers are pretty good at properly ranking players, even if other algorithms are better. Because the pandemic has forced a bunch of adjustments, it stands to reason that the formulas aren’t as good as they usually are at that fundamental task.

Hypothesis

We can test this!

Imagine that we have a definitive list, handed down from God (or Martina Navratilova), that ranks the top 100 players according to their ability right now. No “fairness,” no catering to the what tournament owners want, and no debates–this list is the final word.

The closer a ranking table matches this definite list, the better, right? There are statistics for this kind of thing, and I’ll be using one called the Kendall rank correlation coefficient, or Kendall’s tau. (That’s the Greek letter τ, as in Τσιτσιπάς.) It compares lists of rankings, and if two lists are identical, tau = 1. If there is no correlation whatsoever, tau = 0. Higher tau, stronger relationship between the lists.

My hypothesis is that the official rankings have gotten worse, in the sense that the pandemic-related algorithm adjustments result in a list that is less closely related to that authoritative, handed-down-from-Martina list. In other words, tau has decreased.

We don’t have a definitive list, but we do have Elo. Elo ratings are designed for only one purpose, and my version of the algorithm does that job pretty well. For the most part, my Elo formula has not changed due to the pandemic*, so it serves as a constant reference point against which we can compare the official rankings.

* This isn’t quite true, because my algorithm usually has an injury/absence penalty that kicks in after a player is out of action for about two months. Because the pandemic caused all sorts of absences for all sorts of reasons, I’ve suspended that penalty until things are a bit more normal.

Tau meets the rankings

Here is the current ATP top ten, including Elo rankings:

Player       ATP  Elo  
Djokovic       1    1  
Nadal          2    2  
Medvedev       3    3  
Thiem          4    5  
Tsitsipas      5    6  
Federer        6    -  
Zverev         7    7  
Rublev         8    4  
Schwartzman    9   10  
Berrettini    10    8

I’m treating Federer as if he doesn’t have an Elo rating right now, because he hasn’t played for more than a year. If we take the ordering of the other nine players and plug them into the formula for Kendall’s tau, we get 0.778. The exact value doesn’t really tell you anything without context, but it gives you an idea of where we’re starting. While the two lists are fairly similar, with many players ranked identically, there are a couple of differences, like Elo’s higher estimate of Andrey Rublev and its swapping of Diego Schwartzman and Matteo Berrettini.

Let’s do the same exercise with a bigger group of players. I’ll take the top 100 players in the ATP rankings who met the modest playing time minimum to also have a current Elo rating. Plug in those lists to the formula, and we get 0.705.

This is where my hypothesis falls apart. I ran the same numbers on year-end ATP rankings and year-end Elo ratings all the way back to 1990. The average tau over those 30-plus years is about 0.68. In other words, if we accept that Elo ratings are doing their job (and they are indeed about as predictive as usual), it looks like the pandemic-adjusted official rankings are better than usual, not worse.

Here’s the year-by-year tau values, with a tau value based on current rankings as the right-most data point:

And the same for the WTA, to confirm that the result isn’t just a quirk of the makeup of the men’s tour:

The 30-year average for women’s rankings is 0.723, and the current tau value is 0.764.

What about…

You might wonder if the pandemic is wreaking some hidden havoc with the data set. Remember, I said that I’m only considering players who meet the playing time minimum to have an Elo rating. For this purpose, that’s 20 matches over 52 weeks, which excludes about one-third of top-100 ranked men and closer to half of top-100 women. The above calculations still consider 100 players for year-end 2020 and today, but I had to go deeper in the rankings to find them. Thus, the definition of “top 100” shifts a bit from year-end 2019 to year-end 2020 to the present.

We can’t entirely address this problem, because the pandemic has messed with things in many dimensions. It isn’t anything close to a true natural experiment. But we can look only at “true” top-100 players, even if the length of the list is smaller than usual for current rankings. So instead of taking the top 100 qualifying players (those who meet a playing time minimum and thus have an Elo ranking), we take a smaller number of players, all of whom have top-100 rankings on the official list.

The results are the same. For men, the tau based on today’s rankings and today’s Elo ratings is 0.694 versus the historical average of 0.678. For women, it’s 0.721 versus 0.719.

Still, the rankings feel awfully stale. The key issue is one that Elo can’t help us solve. So far, we’ve been looking at players who are keeping active. But the really out-of-date names on the official lists are the ones who have stayed home. Should Federer still be #6? Heck if I know! In the past, if an elite player missed 14 months, Elo would knock him down a couple hundred points, and if that adjustment were applied to Fed now, it would push down tau. But there’s no straightforward answer for how the inactive (or mostly inactive) players should be rated.

What we’ve learned today

This is the part of the post where I’m supposed to explain why this finding makes sense and why we should have suspected it all along. I don’t think I can manage that.

A good way to think about this might be that there is a sort of tour-within-a-tour that is continuing to play regularly. Federer, Barty, and many others haven’t usually been part of it, while several dozen players are competing as often as they can. The relative rankings of that second group are pretty good.

It doesn’t seem quite fair that Clara Tauson is stuck just inside the top 100 while her Elo is already top-50, or that Rublev remains behind Federer despite an eye-popping six months of results while Roger sat at home. And for some historical considerations–say, weeks inside the top 50 for Tauson or the top 5 for Rublev–maybe it isn’t fair that they’re stuck behind peers who are choosing not to play, or who are resting on the laurels of 18-month-old wins.

But in other important ways, the absolute rankings often don’t matter. Rublev has been a top-five seed at every event he’s played since late September except for Roland Garros, the Tour Finals, and the Australian Open, despite never being ranked above #8. When the tour-within-a-tour plays, he is a top-five guy. The likes of Rublev and Tauson will continue to have the deck slightly stacked against them at the majors, but even that disadvantage will steadily erode if they continue to play at their current levels.

Believing in science as I do, I will take these findings to heart. That means I’ll continue to complain about the problems with the official rankings–but no more than I did before the pandemic.

Podcast Episode 86: A New Documentary on Guillermo Vilas and the No. 1 Ranking

Episode 86 of the Tennis Abstract Podcast features Jeff and co-host Carl Bialik, of the Thirty Love podcast, discussing the new Netflix doc Guillermo Vilas: Settling the Score.

The Argentine star was a multi-slam winner in the 1970s, yet he never reached the top of the official ATP ranking list. The film covers journalist Eduardo Puppos’s quest to prove that Vilas deserved to be #1. Over the course of the episode, we ponder the importance of the top ranking, the vagaries of the ATP ranking algorithm, how Elo rates Vilas’s peak years, and the ATP’s response to Vilas’s case for the top spot. We didn’t love the documentary, but the issues it raises are fun to debate.

Fans of the TA podcast will also want to check out Dangerous Exponents, the new Covid-19 podcast that Carl Bialik and I are doing. Episode 3 will be available later today.

Thanks for listening!

(Note: this week’s episode is about 48 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

There’s Always a Chance: Marie Bouzkova Edition

Last night in Toronto, 91st-ranked qualifier Marie Bouzkova won her quarter-final match against 4th-ranked Simona Halep. Halep retired with a leg injury after losing the first set, so there’s a caveat–even if we were prepared to read too much into a single match, we wouldn’t attribute a lot of meaning to this one. But it’s a big accomplishment for the 21-year-old Czech, who earned her second top-ten scalp of the week and will advance to her first Premier-level semi-final, against no less of an obstacle than Serena Williams.

Here’s the nutty thing: It was Bouzkova’s 62nd match of the 2019 season, her 61st against someone with a WTA ranking. She got the win against the highest-ranked foe–Halep–but just last week, she lost to 636th-ranked CoCo Vandeweghe, her lowest-ranked opponent of the year. Yeah, the caveats keep coming: Vandeweghe is coming back from injury and is surely better than a ranking outside the top 600, and the ITF Transition Tour hijinks mean that the ranking system didn’t work as usual in 2019. Some players who would normally have a very low ranking, like the Kazakh wild card who Bouzkova crushed a couple of weeks ago, don’t count.

Still. 61 matches, with a win against the highest-ranked player and a loss against the lowest.

That sent me to my database, which had plenty more surprises in store. Going back less than a decade, to 2010, I found 127 players who recorded the same oddball combination of feats in a single season, minimum 30 matches. (To be consistent with the Halep result, I included retirements if at least one set was completed.) While many of the players won’t be of wide interest–last year, one of the exemplars was Mira Antonitsch, who didn’t play anyone ranked in the top 400–63 of the 127 player-seasons involved beating a top-100 opponent, 44 included the defeat of someone in the top 50, and 25 were highlighted by a top-ten upset.

Three of them included Halep as the top-ten scalp! That makes Bouzkova the fourth player to beat Halep, not face anyone higher ranked, and also lose to her lowest-ranked opponent of the season. (Through eight months, anyway.) Halep shouldn’t feel too bad, though, as Angelique Kerber has been the extreme-ranked loser in five such cases, four of them in 2017. Ouch.

Here are the 25 player-seasons between 2010 and 2018 in which a WTAer beat her highest-ranked opponent and lost to her lowest:

Year  Player       High-Ranked  Rk  Low-Ranked  Rk       
2017  Kasatkina    Kerber       1   Kanepi      418      
2018  Hsieh        Halep        1   Gasparyan   410      
2010  Jankovic     Serena       1   Diyas       268      
2010  Clijsters    Wozniacki    1   G-Vidagany  258   *  
2014  Cornet       Serena       1   Townsend    205      
2010  Yakimova     Jankovic     2   Dellacqua   980      
2017  Bouchard     Kerber       2   Duval       896   *  
2017  Vesnina      Kerber       2   Azarenka    683      
2016  Bencic       Kerber       2   Boserup     225      
2014  Rybarikova   Halep        2   Eguchi      183      
2017  Mladenovic   Kerber       2   Andreescu   167   *  
2018  Goerges      Wozniacki    3   Serena      451      
2014  Tomljanovic  Radwanska    3   A Bogdan    308      
2015  Mladenovic   Halep        3   Savchuk     262      
2017  Kerber       Pliskova     4   Stephens    934      
2014  Pavlyu'ova   Radwanska    4   Wozniak     241      
2017  Dodin        Cibulkova    5   Rybarikova  453      
2017  Bellis       Radwanska    6   Azarenka    683      
2018  Buyukakcay   Ostapenko    6   Di Sarra    555      
2017  Sakkari      Wozniacki    6   Potapova    454      
2015  L Davis      Bouchard     7   E Bogdan    527      
2015  Ostapenko    S-Navarro    9   Dushevina   1100  *  
2016  KC Chang     Vinci        10  S Murray    862      
2018  Pera         Konta        10  Hlavackova  825      
2018  Danilovic    Goerges      10  Pegula      620

* also faced one unranked player

A quick glance is all it takes to establish that Vandeweghe isn’t the first lowest-ranked player to inspire a “yeah, but” reaction. The list of purportedly weak opponents is very strong for one made up of players with an average ranking outside of the top 500. We have stars such as Victoria Azarenka (twice) and Serena as well as a helping of prospects such as Bianca Andreescu and Victoria Duval.

Consider this as today’s reminder of the limitations of the WTA computer rankings. They tell us who has won a lot of matches in the last 52 weeks, not necessarily who is playing well right now. These cases include many of the most extreme mismatches between official ranking and on-the-day ability. I don’t think it says anything meaningful about a player to show up on this list–though Kerber’s many appearances (as both player and scalp!) are a good summary of her disappointing 2017 campaign.

Bouzkova will remain on the list for at least a couple more days: Serena is currently ranked 10th and both of the other semi-finalists are ranked lower, so Halep will remain her “toughest” opponent. Despite the Czech’s breakout week, it would be understandable if she found herself overawed to face a 23-time slam champion across the net. But one thing is certain: Bouzkova couldn’t care less about the number next to the name.

Picking Favorites With Better Davis Cup Rankings

Yesterday, the ITF announced the seedings for the first new-look Davis Cup Finals, to be held in Madrid this November. The 18-country field was completed by the 12 home-and-way ties contested last weekend. Those 12 winners will join France, Croatia, Spain, and USA (last year’s semi-finalists) along with the two wild cards, recent champions Argentina and Great Britain.

The six nations who skipped the qualifying round will make up five of the top six seeds. (Spain is 7th, while Belgium, who had to qualify, is 4th.) The preliminary round of the November event will feature six round-robin groups of three, each consisting of one top-six seed, a second country ranked 7-12, and a third ranked 13-18. Seeding really matters, as a top position (deserved or not!) guarantees that a side will avoid dangerous opponents like last year’s finalists France and Croatia. Even the difference between 12 and 13 could prove decisive, as a 7-through-12 spot ensures that a nation will steer clear of the always-strong Spaniards, who are seeded 7th.

The seeds are based on the Davis Cup’s ranking system, which relies entirely on previous Davis Cup results. While the formula is long-winded, the concept is simple: A country gets more points for advancing further each season, and recent years are worth the most. The last four years of competition are taken into consideration. It’s not how I would do it, but the results aren’t bad. Four or five of the top six seeds will field strong sides, and one of the exceptions–Great Britain–would have done so had Andy Murray’s hip cooperated. Spain is obviously misranked, but given the limitations of the Davis Cup ranking system, it’s understandable, as the 2011 champions spent 2015 and 2016 languishing outside the World Group.

We can do better

The Davis Cup rankings have several flaws. First, they rely heavily on a lot of old results. If we’re interested in how teams will compete in November, it doesn’t matter how well a side fared three or four years ago, especially if some of their best players are no longer in the mix. Second, they don’t reflect the change in format. Until last year, doubles represented one rubber in a best-of-five-match tie. A good doubles pair helped, but it wasn’t particularly necessary. Now, there are only two singles matches alongside the doubles rubber. The quality of a nation’s doubles team is more important than it used to be.

Let’s see what happens to the rankings when we generate a more forward-looking rating system. Using singles and doubles Elo, I’m going to make a few assumptions:

  • Each country’s top two singles players have a 75% chance of participating (due to the possibility of injury, fatigue, or indifference), and if either one doesn’t take part, the country’s third-best player will replace him.
  • Same idea for doubles, but the top two doubles players have an 85% chance of showing up, to be replaced by the third-best doubles player if necessary.
  • The three matches are equally important. (This isn’t technically true–the third match is likely to be necessary less than half the time, though when it does decide the tie, it is twice as important as the other two matches.)
  • Andy Murray won’t play.

Those assumptions allow us to combine the singles and doubles Elo ratings of the best players of each nation. The result is a weighted rating for each side, one that has a lot of bones to pick with the official Davis Cup rankings.

Forward-looking rankings

The following table shows the 18 countries at the Davis Cup finals along with the 12 losing qualifiers. For each team, I’ve listed their Davis Cup ranking, and their finals seed (if applicable). To demonstrate my results, I’ve shown each nation’s weighted Elo rank and rating and their hard-court Elo rank and rating. The table is sorted by hard-court Elo:

Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
ESP            7     7         1  1936          1  1891  
CRO            2     2         2  1898          2  1849  
FRA            1     1         3  1880          3  1845  
USA            6     6         4  1876          4  1835  
RUS           21    17         7  1855          5  1827  
AUS            9     9         5  1857          6  1820  
SRB            8     8         8  1849          7  1808  
GER           11    11         6  1855          8  1799  
AUT           16              10  1800          9  1766  
ARG            3     3         9  1803         10  1755  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
GBR            5     5        11  1796         11  1750  
SUI           24              14  1763         12  1749  
ITA           10    10        12  1780         13  1745  
CAN           14    13        13  1777         14  1744  
JPN           17    14        15  1735         15  1719  
BEL            4     4        17  1688         16  1673  
CZE           13              16  1712         17  1661  
NED           19    16        18  1685         18  1643  
BRA           28              20  1659         19  1638  
IND           20              21  1652         20  1621  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
SVK           29              22  1645         21  1617  
CHI           22    18        19  1682         22  1609  
KAZ           12    12        26  1582         23  1574  
COL           18    15        24  1597         24  1551  
SWE           15              27  1570         25  1542  
BIH           27              28  1552         26  1540  
POR           26              23  1610         27  1535  
HUN           23              25  1583         28  1533  
UZB           25              29  1491         29  1489  
CHN           30              30  1468         30  1465

Spain is the comfortable favorite, regardless of whether we look at overall Elo or hard-court Elo. When the draw is conducted, we’ll see which top-six seed is unlucky enough to end up with the Spaniards in their group, and whether the hosts will remain the favorite.

The biggest mismatch between the Davis Cup rankings and my Elo-based approach is in our assessment of the Russian squad. Daniil Medvedev is up to sixth in my singles Elo ratings, with Karen Khachanov at 10th. Those ratings might be a little aggressive, but as it stands, Russia is the only player with two top-ten Elo singles players. Spain is close, with Rafael Nadal ranked 2nd and Roberto Bautista Agut 11th, and the hosts have the additional advantage of a deep reservoir of doubles talent from which to choose.

In the opposite direction, my rankings do not forecast good things for the Belgians. David Goffin has fallen out of the Elo top 20, and there are no superstar doubles players to pick up the slack. In a just world, Spain and Belgium will land in the same round-robin group–preferably one without the Russians as well.

Madrid or Maldives

The results I’ve shown assume that every top singles player has the same chance of participating. That’s certainly not the case, with high-profile stars like Alexander Zverev telling the press that they’ll be spending the week on holiday in the Maldives. Some teams are heavily dependent on one singles player who could make or break their chances with a decision or an injury.

As it stands, Germany is 8th in the surface-weighted Elo. If we take Zverev entirely out of the mix, they drop to a tie for 14th with Japan. It’s something the German side would prefer to avoid, but it’s not catastrophic, partly because the Germans were never among the favorites, and partly because Zverev could play only one singles rubber per tie and the doubles replacements are competent.

Even more reliant on a single player is the Serbian side, which qualified last weekend without the help of their most dangerous threat, Novak Djokovic. With Djokovic, the Serbs rank 7th–a case where my surface Elo ratings almost agree with the official rankings. But without the 15-time major winner, the Serbs fall down to a tie with Belgium in 16th place. While the Serbs are unlikely to take home the trophy regardless, Novak would make a huge difference.

The draw will take place next Thursday. We’ll check back then to see which sides have the best forecasts, nine months out from the showdown in Madrid.

The Unique Late-Career Surge of Mihaela Buzarnescu

The newest member of the WTA top 32 got there the hard way. Mihaela Buzarnescu, who achieved her latest career-high ranking with a run to the final of last week’s Prague event, where she lost a three-setter to Petra Kvitova, made her professional debut 14 years ago. Despite a dose of junior success, including a junior doubles title at the 2006 US Open, she didn’t crack the top 100 until last October.

This isn’t how tennis career trajectories are supposed to work. Yes, the game is getting older and stars are extending their careers, but Buzarnescu’s year-long winning spree, in which she has climbed from outside the top 400 to inside the top 40, began after her 29th birthday. The closer we look at what the Romanian has achieved, and the age at which she’s doing so, the more unusual it appears.

The oldest top 100 debuts

Since the beginning of the 1987 season, 630 women have debuted in the top 100. Their average age, on the Monday they reached the ranking threshold, is just under 20 years and 6 months. Only 29 of the 630–less than five percent–broke into the top 100 after their 26th birthday.

Only 14 players did so after turning 27:

Player                 Debut  Age (Y)  Age (D)  Peak Rank  
Tzipi Obziler       20070219       33      306         75  
A. Villagran Reami  19880801       31      359         99  
Mihaela Buzarnescu  20171016       29      165         32  
Julie Ditty         20071105       28      305         89  
Eva Bes Ostariz     20010716       28      183         90  
Mashona Washington  20040719       28       49         50  
Maureen Drake       19990201       27      317         47  
Tatjana Maria       20150406       27      241         46  
Hana Sromova        20051107       27      211         87  
Laura Siegemund     20150914       27      193         27  
Flora Perfetti      19960708       27      160         54  
Louise Allen        19890227       27       51         83  
Kristina Barrois    20081020       27       20         57  
Iryna Bremond       20111017       27       11         93

Buzarnescu doesn’t quite top this list, but she is certainly a more consequential force on tour than either of the women who debuted at a more advanced age. Tzipi Obziler fought her way through the lower levels of the game for just as long as Buzarnescu did, though she never cracked the top 70. Adriana Villagran Reami played a limited schedule; she may have had the skills to play top-100 tennis long before the ranking table made it official, but she was never a tour regular.

The most comparable player to Buzarnescu is Laura Siegemund, who reached a double-digit ranking a few years ago, and has since climbed as high as No. 27. Of the oldest top-100 debutants, though, very few have continued to ascend the rankings as far as Buzarnescu and Siegemund have.

Here are the oldest top-100 debuts of players who went on to crack the top 32:

Player                      Debut  Age (Y)  Age (D)  Peak  
Mihaela Buzarnescu       20171016       29      165    32  
Laura Siegemund          20150914       27      193    27  
Sybille Bammer           20050822       25      117    19  
Shinobu Asagoe           20000710       24       12    21  
Manon Bollegraf          19880215       23      310    29  
Johanna Konta            20140623       23       37     4  
Anne Kremer              19981019       23        2    18  
Lesia Tsurenko           20120528       22      364    29  
Kveta Peschke            19980420       22      286    26  
Petra Cetkovska          20071022       22      256    25  
Tathiana Garbin          20000214       22      229    22  
Li Na                    20041004       22      221     2  
Mara Santangelo          20040202       22      219    27  
Ginger Helgeson Nielsen  19910325       22      192    29  
Casey Dellacqua          20070806       22      176    26

Here’s an indication of just how young women’s tennis is: The 9th-oldest top-100 debutant on this list achieved her feat before her 23rd birthday. Put another way: Of the 107 women to break into the top 100 after their 23rd birthday, only eight went on to a ranking of No. 32 or better. By comparison, about one-third of all top-100 players peak at a ranking in the top 32. In this category, Buzarnescu is charting entirely new territory.

Making up for lost time

The last six months or so have been a whirlwind for the Romanian, as she has gone from a fringe tour player that no one had ever heard of, to a solid tour regular that … well, most fans still don’t know much about. Many players need some time to adjust to the higher level of competition and spend months, even years, stagnating in the rankings. Buzarnescu, on the other hand, has barely stopped to take a breath.

It took 203 days from her top-100 debut last October to her latest career-high at No. 32 on Monday. Siegmund, by comparison, needed 315 days; Sybille Bammer took 574 days; Roberta Vinci, who eventually cracked the top ten, required 2,520 days, or nearly seven years. The average player who reaches the top 32 needs two and a half years between her first appearance in the top 100 and clearing the higher bar.

Buzarnescu’s climb doesn’t fit the mold of older debuts. Her climb has more in common with those of teenage sensations. Again since 1987, here are the 20 quickest ascents:

Player              Age (Y)  Age (D)  Peak  Ascent Days  
Jennifer Capriati        14       11     1            0  
Anke Huber               15      266     4           49  
Agnes Szavay             18      164    13           77  
Lindsay Davenport        16      238     1          112  
Naoko Sawamatsu          17       31    14          119  
Clarisa Fernandez        20      265    26          133  
Maria Sharapova          16       58     1          133  
Serena Williams          16       52     1          133  
Miriam Oremans           20      145    25          140  
Venus Williams           16      301     1          147  
Sofia Arvidsson          21      223    29          154  
Leila Meskhi             19      308    12          168  
Tatiana Golovin          16       22    12          175  
Eugenie Bouchard         19       42     5          189  
Martina Hingis           14       31     1          189  
Ana Ivanovic             16      361     1          196  
Conchita Martinez        16      107     2          196  
Mihaela Buzarnescu       29      165    32          203  
Darya Kasatkina          18      137    11          203  
Ashleigh Barty           20      316    16          210

The player Buzarnescu knocked out of the top 20: Kim Clijsters. She is the only woman on the list to have cracked the top 100 after her 22nd birthday, yet here she is, climbing from No. 101 to No. 32 in less time than 92% of her peers.

Common sense suggests that Buzarnescu can climb only so much higher: Most players don’t set new career highs in their 30s, especially those who have such a short track record of tour-level success. On the other hand, she has adapted quickly, recording her first top ten win, over Jelena Ostapenko, in February and taking a set from Kvitova in Saturday’s final.

What’s more, she’ll reap the benefits of seeds at many events, probably including Roland Garros and Wimbledon. Having proven that she can defeat top 50 players–she holds a 6-7 career record against them–her new status as a top-32 player means she’ll get plenty of opportunities to rack up points against a less-daunting brand of competition. After more a decade of fighting steeply uphill battles, she has finally–improbably–earned a place among the game’s elite. Now all she has to do is keep winning.

Feast, Famine, and Sloane Stephens

Italian translation at settesei.it

Last week, Sloane Stephens reeled off an impressive series of victories, defeating Garbine Muguruza, Angelique Kerber, Victoria Azarenka, and Jelena Ostapenko to secure the title at the WTA Premier Mandatory event in Miami.  The trophy isn’t quite as life-changing as the one she claimed at the US Open last September, but it’s a close second, and the competition she faced along the way was every bit as good.

The Miami title comes with 1,000 WTA ranking points, and by adding those to her previous tally, Stephens moved into the top ten, reaching a career high No. 9 on Monday. With two high-profile championships to her name, not to mention semifinal showings last summer in Toronto and Cincinnati, there’s little doubt she deserves it. Elo isn’t quite convinced, but its more sophisticated algorithm (and its disregard for the magnitude of the US Open and Miami titles) puts her within spitting distance of the top ten as well.

What makes Stephens’s rise to the top ten so remarkable is her efficiency in converting wins to ranking points. Since her return from injury at Wimbledon last year, she has played only 38 matches, winning 24 of them. She has suffered six first-round losses, plus two more defeats at last year’s Zhuhai Elite Trophy round-robin and another pair in the Fed Cup final against Belarus. All told, in the last nine months, she has won matches at only six different events. Her unusual record illustrates some of the quirks in the ranking system, and how a player who peaks at the right times can exploit them.

24 wins is almost never enough for a spot in the vaunted top ten. From 1990 to 2017, a player has finished a season with a top-ten ranking only seven times while winning fewer than 30 matches. Only two of those involved fewer wins than Sloane’s 24: Monica Seles‘s 1993 and 1995, the timespans leading up to her tragic on-court stabbing and following her eventual comeback. Here are the top-ten seasons with the fewest victories, including the last 52 weeks of a few players currently near the top of the WTA table:

Year  Player              YE Rk   W   L  W-L %  
1995  Monica Seles*           1  11   1    92%  
1993  Monica Seles            8  17   2    89%  
2018  Sloane Stephens**       9  24  14    63%  
2010  Serena Williams         4  25   4    86%  
1993  Jennifer Capriati       9  28  10    74%  
2015  Flavia Pennetta         8  28  20    58%  
2000  Mary Pierce             7  29  11    73%  
2004  Jennifer Capriati      10  29  12    71%  
1993  Mary Joe Fernandez      7  31  12    72%  
1995  Iva Majoli              9  31  13    70%  
2018  Venus Williams**        8  31  14    69%  
1995  Mary Joe Fernandez      8  31  15    67%  
2015  Lucie Safarova          9  32  21    60%  
2008  Maria Sharapova         9  33   6    85%  
1998  Steffi Graf             9  33   9    79%  
2018  Petra Kvitova**        10  33  14    70%

* ranking frozen after her assault

** rankings as of April 2, 2018; wins and losses based on previous 52 weeks

What almost all of these seasons have in common is exceptional performances at grand slams. Sloane won the US Open; Seles won the 1993 Australian; Serena Williams won a pair of majors in 2010; Flavia Pennetta capped an otherwise anonymous 2015 campaign with a title in New York. The slams are where the rankings points are.

Even within this group of slam successes, Sloane stands out. Of the 16 players on that list, only two–Pennetta and Lucie Safarova–won matches at a lower rate than Stephens has since her comeback. In other words, most women who are this efficient with their victories don’t lose quite so early or often at lesser events.

That 63% won-loss record is even more extreme than the above list makes it look. Of the nearly 300 year-end top-tenners since 1990, only eight finished the season with a lower win rate. Here’s that list, expanded to the top 11 to include another noteworthy recent season:

Year  Player              YE Rk   W   L  W-L %  
2014  Dominika Cibulkova     10  33  24    58%  
2000  Nathalie Tauziat       10  36  26    58%  
2015  Flavia Pennetta         8  28  20    58%  
1999  Nathalie Tauziat        7  37  25    60%  
2007  Marion Bartoli         10  47  31    60%  
2015  Lucie Safarova          9  32  21    60%  
2000  Anna Kournikova         8  47  29    62%  
2010  Jelena Jankovic         8  38  23    62%  
2018  Sloane Stephens*        9  24  14    63%  
2004  Elena Dementieva        6  40  23    63%  
2016  Garbine Muguruza        7  35  20    64%

* ranking as of April 2, 2018; wins and losses based on previous 52 weeks

There’s not much overlap between these lists; the first group generally missed some time, then made up for it by scoring big at slams, while the second group slogged through a long season and leveled up with a strong finish or two at a major. The typical player with a 63% winning percentage doesn’t end up in the top ten: She wraps up the season, on average, in the mid-twenties. At least that’s better than the average 24-win season: Those result in year-end finishes near No. 40.

Stephens has always been a big-match player: She made an early splash at the 2013 Australian Open, reaching the semifinals and upsetting Serena as a 19-year-old, and her overall career record at majors (66%) is nearly ten percentage points higher than her record at other tour events (57%). For all that, she will probably not conclude 2018 with such a extreme set of won-loss numbers. To do so, she’d probably need to win a major to replace her 2017 US Open points while losing early at most other events. Recovered from injury, Stephens may maintain her feast-or-famine ways to some degree, but it’s unlikely she’ll continue to display such extreme peaks and valleys.

Measuring the Impact of Wimbledon’s Seeding Formula

Italian translation at settesei.it

Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.

This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.

Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.

Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.

Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:

1   Novak Djokovic         2296.5  
2   Andy Murray            2247.6  
3   Roger Federer          2246.8  
4   Rafael Nadal           2101.4  
5   Juan Martin Del Potro  2037.5  
6   Kei Nishikori          2035.9  
7   Milos Raonic           2029.4  
8   Jo Wilfried Tsonga     2020.2  
9   Alexander Zverev       2010.2  
10  Marin Cilic            1997.7  
11  Nick Kyrgios           1967.7  
12  Tomas Berdych          1967.0  
13  Gilles Muller          1958.2  
14  Richard Gasquet        1953.4  
15  Stanislas Wawrinka     1952.8  
16  Feliciano Lopez        1945.3

We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.

The forecasts

Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.

Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:

Player              ATP     W%  Wimb     W%  gElo     W%  
Andy Murray           1  23.6%     1  24.3%     2  24.1%  
Rafael Nadal          2   6.1%     4   5.7%     4   5.5%  
Stanislas Wawrinka    3   0.8%     5   0.5%    15   0.4%  
Novak Djokovic        4  34.1%     2  35.4%     1  34.8%  
Roger Federer         5  21.1%     3  22.4%     3  22.4%  
Marin Cilic           6   1.3%     7   1.0%    10   1.0%  
Milos Raonic          7   2.0%     6   1.6%     7   1.7%  
Dominic Thiem         8   0.4%     8   0.3%    17   0.2%  
Kei Nishikori         9   1.9%     9   1.7%     6   1.9%  
Jo Wilfried Tsonga   10   1.6%    12   1.4%     8   1.5%

Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.

The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:

Player              ATP    SF%  Wimb    SF%  gElo    SF%  
Andy Murray           1  58.6%     1  64.1%     2  63.0%  
Rafael Nadal          2  34.4%     4  39.2%     4  38.1%  
Stanislas Wawrinka    3  13.2%     5   7.7%    15   6.1%  
Novak Djokovic        4  66.1%     2  71.1%     1  70.0%  
Roger Federer         5  49.6%     3  64.0%     3  63.2%  
Marin Cilic           6  13.6%     7  11.1%    10  10.3%  
Milos Raonic          7  17.3%     6  14.0%     7  15.2%  
Dominic Thiem         8   7.1%     8   5.4%    17   3.8%  
Kei Nishikori         9  15.5%     9  14.5%     6  15.7%  
Jo Wilfried Tsonga   10  14.0%    12  13.1%     8  14.0%

There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.

Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.

That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.

The Steadily Less Predictable WTA

Italian translation at settesei.it

Update: The numbers in this post summarizing the effectiveness of sElo are much too high–a bug in my code led to calculating effectiveness with post-match ratings instead of pre-match ratings. The parts of the post that don’t have to do with sElo are unaffected and–I hope–remain of interest.

One of the talking points throughout the 2017 WTA season has been the unpredictability of the field. With the absence of Serena Williams, Victoria Azarenka, and until recently, Petra Kvitova and Maria Sharapova, there is a dearth of consistently dominant players. Many of the top remaining players have been unsteady as well, due to some combination of injury (Simona Halep), extreme surface preferences (Johanna Konta), and good old-fashioned regression to the mean (Angelique Kerber).

No top seed has yet won a title at the Premier level or above so far this year. Last week, Stephanie Kovalchik went into more detail, quantifying how seeds have failed to meet expectations and suggesting that the official WTA ranking system–the algorithm that determines which players get those seeds–has failed.

There are plenty of problems with the WTA ranking system, especially if you expect it to have predictive value–that is, if you want it to properly reflect the performance level of players right now. Kovalchik is correct that the rankings have done a particularly poor job this year identifying the best players. However, there’s something else going on: According to much more accurate algorithms, the WTA is more chaotic than it has been for decades.

Picking winners

Let’s start with a really basic measurement: picking winners. Through Rome, there had been more than 1100 completed WTA matches. The higher-ranked player won 62.4% of those. Since 1990, the ranking system has picked the winner of 67.9% of matches, and topped 70% during several years in the 1990s. It never fell below 66% until 2014, and this year’s 62.4% is the worst in the 28-year time frame under consideration.

Elo does a little better. It rates players by the quality of their opponents, meaning that draw luck is taken out of the equation, and does a better job of estimating the ability level of players like Serena and Sharapova, who for various reasons have missed long stretches of time. Since 1990, Elo has picked the winner of 68.6% of matches, falling to an all-time low of 63.1% so far in 2017.

For a big improvement, we need surface-specific Elo (sElo). An effective surface-based system isn’t as complicated as I expected it to be. By generating separate rankings for each surface (using only matches on that surface), sElo has correctly predicted the winner of 76.2% of matches since 1990, almost cracking 80% back in 1992. Even sElo is baffled by 2017, falling to it’s lowest point of 71.0% in 2017.

(sElo for all three major surfaces is now shown on the Tennis Abstract Elo ratings report.)

This graph shows how effectively the three algorithms picked winners. It’s clear that sElo is far better, and the graph also shows that some external factor is driving the predictability of results, affecting the accuracy of all three systems to a similar degree:

Brier scores

We see a similar effect if we use a more sophisticated method to rate the WTA ranking system against Elo and sElo. The Brier score of a collection of predictions measures not only how accurate they are, but also how well calibrated they are–that is, a player forecast to win a matchup 90% of the time really does win nine out of ten, not six out of ten, and vice versa. Brier scores average the square of the difference between each prediction and its corresponding result. Because it uses the square, very bad predictions (for instance, that a player has a 95% chance of winning a match she ended up losing) far outweigh more pedestrian ones (like a player with a 95% chance going on to win).

In 2017 so far, the official WTA ranking system has a Brier score of .237, compared to Elo of .226 and sElo of .187. Lower is better, since we want a system that minimizes the difference between predictions and actual outcomes. All three numbers are the highest of any season since 1990. The corresponding averages over that time span are .207 (WTA), .202 (Elo), and .164 (sElo).

As with the simpler method of counting correct predictions, we see that Elo is a bit better than the official ranking, and both of the surface-agnostic methods are crushed by sElo, even though the surface-specific method uses considerably less data. (For instance, the clay-specific Elo ignores hard and grass court results entirely.) And just like the results of picking winners, we see that the differences in Brier scores of the three methods are fairly consistent, meaning that some other factor is causing the year-to-year differences:

The takeaway

The WTA ranking system has plenty of issues, but its unusually bad performance this year isn’t due to any quirk in the algorithm. Elo and sElo are structured completely differently–the only thing they have in common with the official system is that they use WTA match results–and they show the same trends in both of the above metrics.

One factor affecting the last two years of forecasting accuracy is the absence of players like Serena, Sharapova, and Azarenka. If those three played full schedules and won at their usual clip, there would be quite a few more correct predictions for all three systems, and perhaps there would be fewer big upsets from the players who have tried to replace them at the top of the game.

But that isn’t the whole story. A bunch of no-brainer predictions don’t affect Brier score very much, and the presence of heavily-favored players also make it more likely that massively surprising results occur, such as Serena’s loss to Madison Brengle, or Sharapova’s ouster at the hands of Eugenie Bouchard. Many unexpected results are completely independent of the top ten, like Marketa Vondrousova’s recent title in Biel.

While some of the year-to-year differences in the graphs above are simply noise, the last several years looks much more like a meaningful trend. It could be that we are seeing a large-scale changing of a guard, with young players (and their low rankings) regularly upsetting established stars, while the biggest names in the sport are spending more time on the sidelines. Upsets may also be somewhat contagious: When one 19-year-old aspirant sees a peer beating top-tenners, she may be more confident that she can do the same.

Whatever influences have given us the WTA’s current state of unpredictability, we can see that it’s not just a mirage created by a flawed ranking system. Upsets are more common now than at any other point in recent memory, whichever algorithm you use to pick your favorites.

Playing Even Better Than Number One

Italian translation at settesei.it

Last night in Miami, Venus Williams beat newly re-minted WTA No. 1 Angelique Kerber. Venus, of course, has plenty of experience clashing with the very best in women’s tennis, with 15 Grand Slam finals and three spells at the No. 1 ranking herself.

Last night’s quarterfinal was Venus’s 37th match against a WTA No. 1  and her 15th win. Kerber became the sixth different top-ranked player to lose at the hands of the elder Williams sister.

All of these numbers are very impressive, especially when you consider that, taken as a whole, WTA No. 1s have won just over 88% of their nearly 2,300 matches since the modern ranking system was instituted. However, Venus doesn’t hold the record in any of these categories.

Records against No. 1s are a somewhat odd classification, since the best players tend to reach the top spot themselves. For example, Martina Hingis played only 11 matches against top-ranked opponents, barely one-fifth as many as the leader in that category. On the other hand, injuries and other layoffs have meant that many all-time greats have found themselves lower in the rankings for long stretches. That is particularly true of Venus and Serena Williams.

With her 37 matches played against No. 1s, Venus is approaching the top of the list, but it will take a superhuman effort to catch Arantxa Sanchez Vicario, at 51:

Rank  Player                   Matches vs No. 1
1     Arantxa Sanchez Vicario                51
2     Gabriela Sabatini                      38
3     Venus Williams                         37
4     Lindsay Davenport                      34
5     Conchita Martinez                      33
6     Helena Sukova                          31
7     Serena Williams                        28
8     Svetlana Kuznetsova                    27
-     Jana Novotna                           27
10    Amelie Mauresmo                        25
11    Maria Sharapova                        23

Wins against No. 1s is a more achievable goal. Martina Navratilova holds the current record at 18*, followed by Serena at 16, and then Lindsay Davenport and Venus at 15:

Rank  Player               Wins  Losses
1     Martina Navratilova    18*      
2     Serena Williams        16      12
3     Lindsay Davenport      15      19
-     Venus Williams         15      22
5     Steffi Graf            11       8
6     Gabriela Sabatini      10      28
7     Amelie Mauresmo         8      17
8     Svetlana Kuznetsova     7      20
-     Maria Sharapova         7      16
-     Mary Pierce             7      15
-     Justine Henin           7       9

*My database does not have rankings throughout Navratilova’s entire career, but other sources credit her with 18 wins.

Win percentage against top-ranked opponents is a bit trickier, as it depends where you set the minimum number of matches. I’ve drawn the line at five. That’s rather low, but I wanted to include Alize Cornet and Elina Svitolina, active players who have each won three of their six matches against No. 1s. By this standard, Venus ranks eighth, though equally reasonable thresholds of 8 or 10 matches would move her up two or three places:

Rank  Player             Wins  Losses   Win%
1     Steffi Graf          11       8  57.9%
2     Serena Williams      16      12  57.1%
3     Petra Kvitova         5       4  55.6%
4     Elina Svitolina       3       3  50.0%
-     Alize Cornet          3       3  50.0%
6     Lindsay Davenport    15      19  44.1%
7     Justine Henin         7       9  43.8%
8     Venus Williams       15      22  40.5%
9     Vera Zvonareva        4       7  36.4%
-     Dinara Safina         4       7  36.4%

Remember that the average player wins fewer than 12% of matches against No. 1s!

Finally, Venus’s defeat of Kerber gave her a win against her sixth different No. 1, moving her into second place in that department. As is so often the case, she trails only her sister, who has beaten seven. Oddly enough, there is very little overlap between Serena’s and Venus’s lists: Their only common victims are Hingis and Davenport. The full list:

Rank  Player               No. 1s defeated
1     Serena Williams                    7
2     Venus Williams                     6
3     Steffi Graf                        5
-     Kim Clijsters                      5
-     Amelie Mauresmo                    5
-     Maria Sharapova                    5
7     Petra Kvitova                      4
-     Lindsay Davenport                  4
-     Justine Henin                      4
-     Svetlana Kuznetsova                4

If Karolina Pliskova–who now stands within 1500 points of No. 1 and could further close the gap in Miami–reaches the top spot, Venus may get a chance to beat a 7th top player. Of course, Serena could get that chance, as well.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner ($latex P(a) > 0.5$), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. $latex P(a) = a^e / (a^e + b^e) $ where $latex a $ are player A’s ranking points, $latex b $ are player B’s ranking points, and $latex e $ is a constant. We use $latex e = 0.85 $ for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.