Italian translation at settesei.it
If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM. Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.
Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match. The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.
Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.
It’s common to find that the losing player met more “keys” than the winner did, as was the case in the Djokovic–Wawrinka semifinal. Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.
According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded. But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter. In their quest for the unusual, IBM’s team missed out on the predictive.
IBM vs generic
IBM offered keys for 86 of the 127 men’s matches at the US Open this year. In 20 of those matches, the loser met as many or more of the keys as the winner did. On average, the winner of each match met 1.13 more IBM keys than the loser did.
This is IBM’s best performance of the year so far. At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the winner. At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.
Without some kind of reference point, it’s tough to know how good or bad these numbers are. As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”
It’s not that difficult. In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.
I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”) It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.
Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves. As I’ll discuss in future posts, these stats–and others–show surprising discontinuities. That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match. These measurements are tailor-made for keys.
For a third key, I tried first-serve percentage. It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them. You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa. And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing. It’s not linear, but he more first serves you put in the box, the better your odds of winning.
Put it all together, and we have three generic keys:
- Winning percentage on first-serve points better than 74%
- Winning percentage on second-serve points better than 52%
- First-serve percentage better than 62%
These numbers are based on the last few years of ATP results on every surface except for clay. For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.
For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better. Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser. Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.
A smarter planet
It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual. It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.
Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy. In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site. “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.
Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right. And it’s not hard to spell “IBM.”
Better keys, more insight
Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one. Commentators often talk about hitting certain targets, like 70% of first serves in. Yet to my knowledge, no one had done the research.
With my generic keys as a first step, this path could get a lot more interesting. While these single numbers are good guides to performance on hard courts, several extensions spring to mind.
Mainly, these numbers could be improved by making player-specific adjustments. 74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner? His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players. For other players, perhaps a higher rate of first serves in is crucial for victory. Or their thresholds vary particularly dramatically based on surface.
In future posts, I’ll delve into more detail regarding these generic keys and investigate ways in which they might be improved. Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.
Sorry this comment is absolutely off-topic… but I’m just so excited! Have you heard the news? Schwartzmann beat Djokovic 6-1 6-2!!!! I told you the boy would come good.
Returning to the knitting, at one point in the second set last night I noticed that Nadal’s first serve percentage (for the set) had dipped well below 50%. It was 43% or something. Before I even looked at his winning percentages on first and second serves, I knew what I would see. Sure enough: about 50% on first serve, 88% on second serve. When his second serve is doing better, he just makes sure he gets more of them. Like some things in subatomic physics, this makes perfect mathematical sense although it’s hard to relate it logically to anything in our own world.
Along similar lines, I have a theory about Djokovic I mean to test — that he shrinks the gap between first and second serves more than anyone else in the game. He neutralizes the first serve so well — all he really has to do is put the return in the backcourt, and then be Novak for the rest of the rally. Aside from the occasional attack on a second-serve return, that’s the same thing he does against second serves, too.
I’m cherrypicking–there are plenty of examples to the contrary–but in the Youzhny match, Misha’s 1st and 2nd serve WPs were almost identical.
Maybe that’s why he gets so frustrated when he comes up against a serve monster on good form – like Isner – who beats him by never letting him master the serve. “But I’m entitled to a rally on every point!” (pouts, stamps foot).
Rather than splitting it into first and seconds serves, why not just consider percentage of points won on serve in total? Then you could add a key for percentage of points won on return. I’d imagine that would do pretty well.
Even better just consider percentage of points won overall!
Seriously, though, the more your keys are correlated to the outcome then the less you are essentially saying. Without an actual explicit model for what is really going on in a particular tennis matchup I don’t think any groundbreaking insights will be forthcoming from studying these sorts of statistics.