Italian translation at settesei.it
In my post last week, I outlined what the error stats of the future may look like. A wide range of advanced stats across different sports, from baseball to ice hockey–and increasingly in tennis–follow the same general algorithm:
- Classify events (shots, opportunities, whatever) into categories;
- Establish expected levels of performance–often league-average–for each category;
- Compare players (or specific games or tournaments) to those expected levels.
The first step is, by far, the most complex. Classification depends in large part on available data. In baseball, for example, the earliest fielding metrics of this type had little more to work with than the number of balls in play. Now, batted balls can be categorized by exact location, launch angle, speed off the bat, and more. Having more data doesn’t necessarily make the task any simpler, as there are so many potential classification methods one could use.
The same will be true in tennis, eventually, when Hawkeye data (or something similar) is publicly available. For now, those of us relying on public datasets still have plenty to work with, particularly the 1.6 million shots logged as part of the Match Charting Project.*
*The Match Charting Project is a crowd-sourced effort to track professional matches. Please help us improve tennis analytics by contributing to this one-of-a-kind dataset. Click here to find out how to get started.
The shot-coding method I adopted for the Match Charting Project makes step one of the algorithm relatively straightforward. MCP data classifies shots in two primary ways: type (forehand, backhand, backhand slice, forehand volley, etc.) and direction (down the middle, or to the right or left corner). While this approach omits many details (depth, speed, spin, etc.), it’s about as much data as we can expect a human coder to track in real-time.
For example, we could use the MCP data to find the ATP tour-average rate of unforced errors when a player tries to hit a cross-court forehand, then compare everyone on tour to that benchmark. Tour average is 10%, Novak Djokovic‘s unforced error rate is 7%, and John Isner‘s is 17%. Of course, that isn’t the whole picture when comparing the effectiveness of cross-court forehands: While the average ATPer hits 7% of his cross-court forehands for winners, Djokovic’s rate is only 6% compared to Isner’s 16%.
However, it’s necessary to take a wider perspective. Instead of shots, I believe it will be more valuable to investigate shot opportunities. That is, instead of asking what happens when a player is in position to hit a specific shot, we should be figuring out what happens when the player is presented with a chance to hit a shot in a certain part of the court.
This is particularly important if we want to get beyond the misleading distinction between forced and unforced errors. (As well as the line between errors and an opponent’s winners, which lie on the same continuum–winners are simply shots that were too good to allow a player to make a forced error.) In the Isner/Djokovic example above, our denominator was “forehands in a certain part of the court that the player had a reasonable chance of putting back in play”–that is, successful forehands plus forehand unforced errors. We aren’t comparing apples to apples here: Given the exact same opportunities, Djokovic is going to reach more balls, perhaps making unforced errors where we would call Isner’s mistakes forced errors.
Outcomes of opportunities
Let me clarify exactly what I mean by shot opportunities. They are defined by what a player’s opponent does, regardless of how the player himself manages to respond–or if he manages to get a racket on the ball at all. For instance, assuming a matchup between right-handers, here is a cross-court forehand:
Player A, at the top of the diagram, is hitting the shot, presenting player B with a shot opportunity. Here is one way of classifying the outcomes that could ensue, together with the abbreviations I’ll use for each in the charts below:
- player B fails to reach the ball, resulting in a winner for player A (vs W)
- player B reaches the ball, but commits a forced error (FE)
- player B commits an unforced error (UFE)
- player B puts the ball back in play, but goes on to lose the point (ip-L)
- player B puts the ball back in play, presents player A with a “makeable” shot, and goes on to win the point (ip-W)
- player B causes player A to commit a forced error (ind FE)
- player B hits a winner (W)
As always, for any given denominator, we could devise different categories, perhaps combining forced and unforced errors into one, or further classifying the “in play” categories to identify whether the player is setting himself up to quickly end the point. We might also look at different categories altogether, like shot selection.
In any case, the categories above give us a good general idea of how players respond to different opportunities, and how those opportunities differ from each other. The following chart shows–to adopt the language of the example above–player B’s outcomes based on player A’s shots, categorized only by shot type:
The outcomes are stacked from worst to best. At the bottom is the percentage of opponent winners (vs W)–opportunities where the player we’re interested in didn’t even make contact with the ball. At the top is the percentage of winners (W) that our player hit in response to the opportunity. As we’d expect, forehands present the most difficult opportunities: 5.7% of them go for winners and another 4.6% result in forced errors. Players are able to convert those opportunities into points won only 42.3% of the time, compared to 46.3% when facing a backhand, 52.5% when facing a backhand slice (or chip), and 56.3% when facing a forehand slice.
The above chart is based on about 374,000 shots: All the baseline opportunities that arose (that is, excluding serves, which need to be treated separately) in over 1,000 logged matches between two righties. Of course, there are plenty of important variables to further distinguish those shots, beyond simply categorizing by shot type. Here are the outcomes of shot opportunities at various stages of the rally when the player’s opponent hits a forehand:
The leftmost column can be seen as the results of “opportunities to hit a third shot”–that is, outcomes when the serve return is a forehand. Once again, the numbers are in line with what we would expect: The best time to hit a winner off a forehand is on the third shot–the “serve-plus-one” tactic. We can see that in another way in the next column, representing opportunities to hit a fourth shot. If your opponent hits a forehand in play for his serve-plus-one shot, there’s a 10% chance you won’t even be able to get a racket on it. The average player’s chances of winning the point from that position are only 38.4%.
Beyond the 3rd and 4th shot, I’ve divided opportunities into those faced by the server (5th shot, 7th shot, and so on) and those faced by the returner (6th, 8th, etc.). As you can see, by the 5th shot, there isn’t much of a difference, at least not when facing a forehand.
Let’s look at one more chart: Outcomes of opportunities when the opponent hits a forehand in various directions. (Again, we’re only looking at righty-righty matchups.)
There’s very little difference between the two corners, and it’s clear that it’s more difficult to make good of a shot opportunity in either corner than it is from the middle. It’s interesting to note here that, when faced with a forehand that lands in play–regardless of where it is aimed–the average player has less than a 50% chance of winning the point. This is a confusing instance of selection bias that crops up occasionally in tennis analytics: Because a significant percentage of shots are errors, the player who just placed a shot in the court has a temporary advantage.
Next steps
If you’re wondering what the point of all of this is, I understand. (And I appreciate you getting this far despite your reservations.) Until we drill down to much more specific situations–and maybe even then–these tour averages are no more than curiosities. It doesn’t exactly turn the analytics world upside down to show that forehands are more effective than backhand slices, or that hitting to the corners is more effective than hitting down the middle.
These averages are ultimately only tools to better quantify the accomplishments of specific players. As I continue to explore this type of algorithm, combined with the growing Match Charting Project dataset, we’ll learn a lot more about the characteristics of the world’s best players, and what makes some so much more effective than others.