Italian translation at settesei.it
By winning the US Open last weekend and increasing his career total to ten Grand Slams, Novak Djokovic has pushed himself even further into conversations about the greatest of all time. At the very least, his 2015 season is shaping up to be one of the best in tennis history.
A recent FiveThirtyEight article introduced Elo ratings into the debate, showing that Djokovic’s career peak–achieved earlier this year at the French Open–is the highest of anyone’s, just above 2007 Roger Federer and 1980 Bjorn Borg. In implementing my own Elo ratings, I’ve discovered just how close those peaks are.
Here are my results for the top 15 peaks of all time [1]:
Player Year Elo Novak Djokovic 2015 2525 Roger Federer 2007 2524 Bjorn Borg 1980 2519 John McEnroe 1985 2496 Rafael Nadal 2013 2489 Ivan Lendl 1986 2458 Andy Murray 2009 2388 Jimmy Connors 1979 2384 Boris Becker 1990 2383 Pete Sampras 1994 2376 Andre Agassi 1995 2355 Mats Wilander 1984 2355 Juan Martin del Potro 2009 2352 Stefan Edberg 1988 2346 Guillermo Vilas 1978 2325
A one-point gap is effectively nothing: It means that peak Djokovic would have a 50.1% chance of beating peak Federer. The 35-point gap separating Novak from peak Rafael Nadal is considerably more meaningful, implying that the better player has a 55% chance of winning.
Surface-specific Elo
If we limit our scope to hard-court matches, Djokovic is still a very strong contender, but Fed’s 2007 peak is clearly the best of all time:
Player Year Hard Ct Elo Roger Federer 2007 2453 Novak Djokovic 2014 2418 Ivan Lendl 1989 2370 Pete Sampras 1997 2356 Rafael Nadal 2014 2342 John McEnroe 1986 2332 Andy Murray 2009 2330 Andre Agassi 1995 2326 Stefan Edberg 1987 2285 Lleyton Hewitt 2002 2262
Ivan Lendl and Pete Sampras make much better showings on this list than on the overall ranking. Still, they are far behind Fed and Novak–the roughly 100-point difference between peak Fed and peak Pete is equivalent to a 64% probability that the higher-rated player would win.
On clay, I’ll give you three guesses who tops the list–and your first two guesses don’t count. It isn’t even close:
Player Year Clay Ct Elo Rafael Nadal 2009 2550 Bjorn Borg 1982 2475 Novak Djokovic 2015 2421 Ivan Lendl 1988 2408 Mats Wilander 1984 2386 Roger Federer 2009 2343 Jose Luis Clerc 1981 2318 Guillermo Vilas 1982 2316 Thomas Muster 1996 2313 Jimmy Connors 1980 2307
Borg was great, but Nadal is in another league entirely. Though Djokovic has pushed Nadal out of many greatest-of-all-time debates–at least for the time being–there’s little doubt that Rafa is the greatest clay court player of all time, and likely the most dominant player in tennis history on any single surface.
Djokovic is well back of both Nadal and Borg, but in his favor, he’s the only player ranked in the top three for both major surfaces.
The survivor
As the second graph in the 538 article shows, Federer stands out as the greatest player of all time at his age. Most players have retired long before their 34th birthday, and even those who stick around aren’t usually contesting Grand Slam finals. In fact, Federer’s Elo rating of 2393 after his US Open semifinal win against Stanislas Wawrinka last week would rank as the sixth-highest peak of all time, behind Lendl and just ahead of Andy Murray.
Here are the top ten Elo peaks for players over 34:
Player Age 34+ Elo Roger Federer 34.1 2393 Jimmy Connors 34.1 2234 Andre Agassi 35.3 2207 Rod Laver 36.6 2207 Ken Rosewall 37.4 2195 Tommy Haas 35.3 2111 Arthur Ashe 35.7 2107 Ivan Lendl 34.1 2054 Andres Gimeno 35.0 2035 Mark Cox 34.0 2014
The 160-point gap between Federer and Jimmy Connors implies that 34-year-old Fed would win about 70% of the time against 34-year-old Connors. No one has ever sustained this level of play–or anything close to it–for this long.
At the risk of belaboring the point, similar arguments can be made for 33-year-old Fed, all the way to 30-year-old Fed. At almost any stage in the last four years, Federer has been better than any player in history at that age [2]. Djokovic has matched many of Roger’s career accomplishments so far, especially on clay, but it would be truly remarkable if he maintained a similar level of play through the end of the decade.
Current Elo ratings
While it’s not really germane to today’s subject, I’ve got the numbers, so let’s take a look at the current ATP Elo ratings. Since Elo is new to most tennis fans, I’ve included columns to indicate each player’s chances of beating Djokovic and of beating the current #10, Milos Raonic, based on their rating. As a general rule, a 100-point gap translates to a 64% chance of winning for the favorite, a 200-point gap implies 76%, and a 500-point gap is equivalent to 95%.
Rank Player Elo Vs #1 Vs #10 1 Novak Djokovic 2511 - 91% 2 Roger Federer 2386 33% 84% 3 Andy Murray 2332 26% 79% 4 Kei Nishikori 2256 19% 71% 5 Rafael Nadal 2256 19% 71% 6 Stan Wawrinka 2186 13% 62% 7 David Ferrer 2159 12% 58% 8 Tomas Berdych 2148 11% 56% 9 Richard Gasquet 2128 10% 54% 10 Milos Raonic 2103 9% - Rank Player Elo Vs #1 Vs #10 11 Gael Monfils 2084 8% 47% 12 Jo-Wilfried Tsonga 2083 8% 47% 13 Marin Cilic 2081 8% 47% 14 Kevin Anderson 2074 7% 46% 15 John Isner 2035 6% 40% 16 David Goffin 2027 6% 39% 17 Grigor Dimitrov 2021 6% 38% 18 Gilles Simon 2005 5% 36% 19 Jack Sock 1994 5% 35% 20 Roberto Bautista Agut 1986 5% 34% Rank Player Elo Vs #1 Vs #10 21 Philipp Kohlschreiber 1982 5% 33% 22 Tommy Robredo 1963 4% 31% 23 Feliciano Lopez 1955 4% 30% 24 Nick Kyrgios 1951 4% 29% 25 Ivo Karlovic 1949 4% 29% 26 Jeremy Chardy 1940 4% 28% 27 Alexandr Dolgopolov 1940 4% 28% 28 Bernard Tomic 1936 4% 28% 29 Fernando Verdasco 1932 3% 27% 30 Fabio Fognini 1925 3% 26%
Notes:
- These numbers don’t precisely agree with 538’s, or with either of two other recent sets of ratings. Some of the discrepancy seems to be due to including or excluding retirements and withdrawals–both 538 and I are excluding them, but when I included retirements (though not withdrawals), Federer and Djokovic swapped places at the top of the list.
- 538’s graph shows Lendl ahead at age 30 and Connors with a slight edge briefly around age 32.
All this comparing players across eras is a lot of fun and all, but I don’t think you can stress enough how meaningless these comparisons really are.
There really is no way to adjust the ratings to take account of the changing overall level of skill in the game, and there’s all sorts of other issues too. The ratings only really mean something in comparison to other players playing at the same time.
There’s always some completely indefensible fiddle in order to keep the top guys’ elo ratings at roughly the same level across eras, and then comparisons are obviously ridiculous.
perhaps you’d like to elaborate on the ‘completely indefensible fiddle’ being used here. the source code for one of the other recent ratings (which I’ve adapted almost exactly) is here:
https://github.com/sleepomeno/tennis_atp/blob/master/examples/elo.R
I will have a look at that, but the problem with elo comparisons across eras is simply that, not only can you add a constant offset to everyone’s ratings and not change anything, you can also add any function of time to the ratings and not change anything.
So right off the bat elo just cannot tell if average standards are changing over time. Since standards almost certainly are changing you ought to add something, but I couldn’t tell you what it should be.
What you’re really comparing is some level of dominance across eras, but even that concept has problems. If you have changing numbers of players/matches in your model then the spread of elo ratings also changes – you have to decide who the ‘average’ player is, or some sort of reference player, again I couldn’t tell you how to do that in an unbiased way.
There’s a number of arbitrary seeming choices that have been made in there: a k-factor which gets smaller the more matches a player has played, the k-factors are also player specific so the rating changes are no longer zero sum, players start with a 1500 ranking which is somewhere near the average (at least initially)…
Also it is only using the atp+gs results (no qualifiers or challengers) which I believe skews the results in odd ways for those players on the edge of being atp tour regulars – their average opponent in included matches is better than them because you don’t often see the matches where they win.
I did a quick analysis of how year-end ratings are changing over time and the average is rising. There’s a few effects that are partially cancelling out that cause that – players retiring and taking their points with them causes the average to go down, but those fringe players losing a couple of matches and never being seen again cause the average to go up (especially given that they start at 1500 and immediately drop a ton of points due to their big k-factors).
Unless you find a satisfactory way to adjust for these things you can’t compare across eras.
I know you weren’t crazy about my idea of scaling average ELO to zero, but my other point (using 500 in the exponential denominator) would allow you to eliminate the last two columns in the chart altogether and anyone could calculate any player’s approximate percentage against any other player just by dividing the ELO difference by 10 and adding to 50.
I’ll stop now. 🙂
I hear you. Since my goal is to turn this into something with predictive value, there’s a lot of work to be done — surface adjustments, handling players who have missed a lot of time, and newbies, as you mentioned on Twitter. I don’t know what the end result will look like, but I expect it’ll involve some rescaling, especially to handle players at lower levels.
sorry, i should be clear that the last “Jeff” comment is from me and not Jeff Sackman.
very nice analysis and thanks for sharing the code.
Is jrank based on Elo like ratings ? ( http://tennisabstract.com/jrank/atp.html) Or are you planning to replace it with something like Elo ?
Also for jrank , I thought that if two players have x and y points, winning probability of the former is x/(x+y) , so basically the ratio matters. [ intuition coming from logistic regression ] .
For Elo, it seems the difference matters. So do they (closely) match each other after taking a logarithm (and change of origin and scale ?)
jrank is somewhat like elo, in that it gives players points based on what they beat (and lost to), instead of when they did so. It’s a lot more complicated though, and some of that complexity is probably unnecessary. I’ve yet to test Elo vs Jrank for predictiveness (to see how similar they are, or to see which is more predictive), and before I do that, there are many things to do that might improve Elo, particularly with surface adjustments.
Yep, you’re right about ratio vs difference.
Hey do you have a surface specific ELO for
1. Benjamin Becker on hard courts
2. Daniel Gimeno Traver on clay
3. Radek Stepanek on grass?
Kinda wondering how close to 1500 they will be, because they have (over the last couple of years) stats that closely resemble the average stats on those surfaces.
Becker hard: 1677
DGT clay: 1715
Haven’t run grass #s, don’t know how meaningful they would be given how few matches there are.
Using only tour-level data, 1500 is essentially replacement level, since there are so many players entering the league for the first time (or first few). Becker is 9-2 against players outside the top 100 since the beginning of last year, and some of those guys are even above 1500.
Looking at who’s at 1500, it’s looks like the 150-200 range in the rankings, with some tweaks for surface. (Pospisil is 1350 on clay.) In the overall ratings, you have guys like Moriya, Millot, Gombos, Daniel, and Lindell around 1500.
Hi Jeff,
the hard court max-ELO happened for Nadal should be 2013 (not 2014), right?
Hi Jeff,
To my mind ELO type ranking are not very good for historical comparisons.
See for example TrueSkill Through Time
http://research.microsoft.com/apps/pubs/default.aspx?id=74417
or Whole-History Rating
http://www.remi-coulom.fr/WHR/
I don’t doubt the theoretical improvement of TrueSkill or WHR, but in practice, is an improvement of roughly .005 worth the additional computation time and complexity?
Also, I probably wouldn’t say ELO is “not very good” if the basis for that statement is that True Skill and WHR are only a half a percentage point better.
My point was not to say that “ELO is not very good” but that “ELO is not very good for historical comparisons”.
By that I mean that if you try to answer the question “Who is the best player in the last 50 years ?” ELO may not be the best tool.
ELO is obviously a good tool to make fast estimations of win probabilities of players in the same time era.
But I think that the ELO rating of Connors in 1980 can’t be compared to the ELO rating of Djokovic in 2015 simply because a 1 ELO point in 1980 doesn’t have the same value as 1 ELO point today.
For Chess, several alternatives to ELO have been proposed that may improve historical comparisons:
* EDO : http://www.edochess.ca/
* Glicko : http://www.glicko.net/research/glicko.pdf
* WHR
* TrueSkill / Trueskill through time etc.
I like very much your blog, which is a fantastic place for a fan of tennis stats like me.
My comment was not intended to doubt the quality of your analysis, it was just a comment 🙂
To clarify, the reply to your original comment came from me (Jeff M), not Jeff Sackmann, who runs Tennis Abstract. Sorry for the confusion. I should start using a different handle.
Anyway, my point is that I don’t see ELO as significantly worse than the other systems you cited. In the WHR paper, it looks like none are more than a half percent better, but all seem more complicated to generate. Not that I’m a defender of ELO, but it might be fairer to say “none of the systems are very good for historical comparisons.”
Hello Jeff,
As an ELO newbie, was curious about a few things without going into the actual computation – would be grateful if you could clarify.
1. Does the ELO rating change depending on a tournament i.e. if it is a slam vs Masters vs 500s vs 250s etc ? Or if player A beats player B the net positive score for A would be exactly the same irrespective of the tournament?
2. Does the number of sets matter i.e if its a best of 5 or a best of 3?
3. Does the score matter – a straight set win vs a deciding set; do games matter i.e does 6-1,6-1 score more than 6-4,7-5?
4. Finally, why do you use the term “peak ELO” which suggests a value at a point in time as opposed to a total / average ELO over a period? Is average ELO indeed a better way of measuring career success?
Thanks
1. no – tourney level doesn’t matter.
2. in this calculation, no. Bialik and Morris tested giving 5-setters more weight, but they say it didn’t make the results more predictive.
3. in this version, no. They also tested some variations of this, and apparently it didn’t help.
4. At any given time, a player’s Elo reflects their results over a period of time. After a win, it goes up; after a loss, it goes down, and the amount depends on the quality of the opponent. Peak vs career is ultimately a religious debate — you could argue that the best player is the one who sustained the highest level over a single year, or three years, or five years, or ten years, or any number of other permutations. None of them are final.
Hi Jeff,
Thanks for putting this article together – amazing work! I have a few questions about the Elo computation that I was hoping you could help me with!
1) How did you compute initial Elo scores? I imagine that you must begin with data from a specific year, say 1968. I also know that you must have a 1500 point average. Do you then allocate points based on ATP rank; what formula, if any do you use? From experience, I know that higher ranked players (1,2,3) have a bigger points spread than the lower ranked players (1200,1300 ranks). Do you account for this in any way when allocation initial points or are the points normally distributed when allocated to players?
2) Do you reset Elo ratings at the beginning of each year or do you keep the points going?
3) How do you deal with players that are injured or take a year or more time off the courts. Do you delete the data for this players? how do you compute the Elo when they come back?
3) How do you see the Elo ratings being applied to doubles players and tournaments (as a side note, do you have any databases on doubles matches?)
Thanks so much!
everyone starts in 1968 with 1500 points. no, not reset each year. I did something very similar to elo for doubles for a Tennis Magazine article last year; might revisit it soon. Nope, no doubles data.