One of the most popular posts on this blog has been this one, which quantified the speed of every ATP tournament’s surface. At the very least, it’s time to provide some updated numbers. Beyond that, we can improve on the methodology and say more about how much we can learn from the numbers.
I was prompted to improve the methodology when I ran an update this week to see how fast the courts are at the O2 Arena in London. The algorithm, which compares the number of aces (or service points won, or first service points won) to the number we’d expect from those players based on their season average, told me that London is much slower than average–almost 20% below average, on par with Roland Garros and the pre-blue clay Madrid Masters.
Counterintuitive conclusions are fun, but that’s just wrong.
Here’s the problem: Service stats aren’t only affected by servers. Sure, when Milos Raonic is serving, there will be more aces than when Mikhail Youzhny is serving. But how many aces Raonic hits is also influenced by the returning skills of the man on the other side of the net. It’s clear why the algorithm got London so wrong: The eight or nine best players in the world got to where they are (in part, anyway) by getting more balls back. No matter how fast the court, Mardy Fish wasn’t going to hit as many aces past Jo Wilfried Tsonga or Rafael Nadal in London as he did against Bernard Tomic in Shanghai or Tokyo.
I’ll be more succinct. The goal is to compare the number of aces on a particular surface to the number of aces we’d expect on a neutral surface. The number of Expected aces depends on more than just the man serving; it also depends on the man receiving.
(In my article last year, I used three different stats (ace rate, first serve winning percentage, and overall winning percentage on serve) to measure surface speed. They track each other fairly closely, so there’s not a lot of additional value gained by using more than one. From here on out, I’m measuring surface speed only by relative ace rate.)
Incorporating more data
To factor in the additional variable, we need each player’s ace rate for the season along with his ace against rate. With those two numbers, together with the overall ATP average, we can apply the odds ratio method to get a better idea of each match’s expected aces.
For each server in each match, we compare his actual aces to his expected aces, and then take the average of all of those ratios. The tournament-wide average gives us an estimate of how fast the courts played at that event.
The improved algorithm still insists that aces were 3% lower than on a neutral surface at the 2011 Tour Finals, but counters that with the conclusion that aces were 18% and 8% more than on a neutral surface in 2009 and 2010, respectively. A weighted average of those three seasons (more on that in a bit) estimates that the O2 Arena gives us 4% more aces than a neutral surface.
The variance from year to year–in some cases, like that of London, suggesting that a surface is faster than average one year, slower than average the next–is a bit worrisome. At the very least, we can’t simply take a one-year calculation for a single tournament and treat it as the final word, especially when the event only includes 15 matches.
Multi-year averages and (extremely mild) projections
If we want to know exactly what happened in one edition of a tournament, the single-year number is instructive. Perhaps the weather, or the lighting, was very bad or very good, causing an unusually high or low number of aces. Just because a tournament’s number for 2012 doesn’t match its numbers for any of the previous three years doesn’t mean it’s wrong.
However, the variety of effects that give us this year-to-year variance do warn us that last year’s number will not accurately predict this year’s number.
The year-to-year correlation of relative ace rate (as I’ve described it above), is not very strong (r = .35). One way to modestly improve it is to use a three-year weighted average. A 3/2/1 weighted average of 2011, 2010, and 2009 numbers gives us a better forecast of how the surface will play in the following year (r = .5).
Another way of looking at these more reliable forecasts is that they get closer to isolating the effect of the surface. As I noted in last year’s article, the weather effects of Hurricane Irene dampened the ace rate at last year’s US Open. By my new algorithm, the ace rate last year was 7% lower than a neutral surface, while this year it was 5% higher than a neutral surface. The three-year weighted average would have been able to look past Irene; using data from 2009-11, it estimated that courts in Flushing were exactly neutral. That not only turned out to be a better projection for 2012 than the -7% of 2011, it also probably better described the influence of the court surface, as separate from the weather conditions.
Below the jump, find the complete list of all tour-level events that have been played in 2011 and/or 2012. The first four numerical columns show the relative ace rate for each year from 2009 to 2012. For instance, in Costa Do Sauipe this year, there were a staggering 61% more aces than expected. The final two columns show the weighted averages for 2011 and 2012. Each event’s “2012 Wgt” is my best estimate of the current state of the surface and how it will play next year.
I’ve also created a prettier, sortable version of the same table.
Continue reading The Speed of Every Surface, Redux