Italian translation at settesei.it
If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM. Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.
Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match. The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.
Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.
It’s common to find that the losing player met more “keys” than the winner did, as was the case in the Djokovic–Wawrinka semifinal. Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.
According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded. But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter. In their quest for the unusual, IBM’s team missed out on the predictive.
IBM vs generic
IBM offered keys for 86 of the 127 men’s matches at the US Open this year. In 20 of those matches, the loser met as many or more of the keys as the winner did. On average, the winner of each match met 1.13 more IBM keys than the loser did.
This is IBM’s best performance of the year so far. At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the winner. At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.
Without some kind of reference point, it’s tough to know how good or bad these numbers are. As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”
It’s not that difficult. In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.
I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”) It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.
Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves. As I’ll discuss in future posts, these stats–and others–show surprising discontinuities. That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match. These measurements are tailor-made for keys.
For a third key, I tried first-serve percentage. It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them. You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa. And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing. It’s not linear, but he more first serves you put in the box, the better your odds of winning.
Put it all together, and we have three generic keys:
- Winning percentage on first-serve points better than 74%
- Winning percentage on second-serve points better than 52%
- First-serve percentage better than 62%
These numbers are based on the last few years of ATP results on every surface except for clay. For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.
For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better. Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser. Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.
A smarter planet
It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual. It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.
Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy. In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site. “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.
Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right. And it’s not hard to spell “IBM.”
Better keys, more insight
Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one. Commentators often talk about hitting certain targets, like 70% of first serves in. Yet to my knowledge, no one had done the research.
With my generic keys as a first step, this path could get a lot more interesting. While these single numbers are good guides to performance on hard courts, several extensions spring to mind.
Mainly, these numbers could be improved by making player-specific adjustments. 74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner? His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players. For other players, perhaps a higher rate of first serves in is crucial for victory. Or their thresholds vary particularly dramatically based on surface.
In future posts, I’ll delve into more detail regarding these generic keys and investigate ways in which they might be improved. Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.