Italian translation at settesei.it
I’m sure you’ve heard about the trend. First, statistics overhauled baseball, and teams in every major sport now employ quants to search out that extra edge. Tennis has lagged behind the others, but with the help of big data, we’re on the cusp of a whole new era.
That’s the story, anyway. Yesterday brought us another example.
What happened in baseball is, quite simply, never going to happen in tennis.
To oversimplify a bit, the “Moneyball revolution” refers to front offices using analytics to identify underrated and underpriced players. To a lesser extent, it refers to deploying those players in a smarter way–say, rearranging the batting order or attempting fewer stolen bases.
In tennis, there are no front offices. Players aren’t paid salaries by teams. And there are no managers to decide how best to use their players.
In short: There are no organizations with both the incentives and the resources to analyze data.
Of course, when people get breathless about all the raw data floating around in tennis, that isn’t what they’re talking about. (No one really thinks Hawkeye data is going to revolutionize, say, the World Team Tennis draft.) Instead, they are implying that the data can be analyzed in such a way to be actionable for players.
That’s an admirable objective. In theory, Kevin Anderson’s coach could look at all the data from all the matches between Anderson and Tomas Berdych and identify which tactics worked, which didn’t, and make recommendations accordingly. Of course, Kevin’s coach is already watching all those matches, taking notes, reviewing video, and presumably making recommendations, so if big data is going to change the game, it needs to somehow offer coaches demonstrably better insights.
With all the cameras pointed at tennis’s show courts, that’s certainly possible. The closest analogue in baseball is the pitch f/x system, which tracks the speed, location, and movement of every pitch. Some pitchers have been able to use pitch f/x data to analyze and improve upon their own performance. The same could eventually happen in tennis. But there are systemic reasons why it hasn’t yet, and those root causes are unlikely to disappear anytime soon.
What needs to change
Hawkeye cameras are aimed at a lot of courts and have the capability of collecting an enormous amount of data. That’s how broadcasts are able to bring you stats like average net clearance and meters run. Those cameras also help generate graphics like those showing where all of a player’s serves landed.
After a match is over, with no calls left to be overturned and no broadcast needs likely to arise, what happens to the data? For all practical purposes, it gets stashed in the attic and forgotten. (Here’s a more thorough explanation.) Contrast that to Major League Baseball, which makes all pitch f/x data available immediately–to the public, for free–and has archived it indefinitely.
If tennis is to see any meaningful analytical breakthroughs, Hawkeye data needs to be aggregated in a single database. Results from one match are sometimes interesting (hey look, Andy’s net clearance is 15% greater than Roger’s!), but if we’re always looking at one match, or one tournament, at a time, we’ll never learn which of these Hawkeye-derived statistics matter, or how much.
IBM, the collector of much of this information, may already maintain some version of that database. But the results are jaw-droppingly uninspiring. On broadcasts, we get the same old stats and graphics. When IBM has ventured into predicting match outcomes, their “millions of data points” are outperformed by my much simpler model.
IBM is the one organization in the sport with the resources to do the kind of analysis that will transform tennis. But they have no incentive to do so. To IBM (and now SAP, in the women’s game), tennis is a public relations opportunity, one that allows them to brand tournament websites and on-screen graphics with their logo. (Not to mention those suspiciously pro-IBM trend pieces linked to above.)
Players might eventually benefit from data-based insights, but only a tiny fraction of them could afford to hire even a single analyst. (Hi Simona! Text me anytime!)
Once again, we have to turn to baseball for a precedent. Even in that immense sport, with its billion-dollar franchises, it was amateurs–outsiders–who did the work that brought about the analytics revolution. Even now, with teams aggressively hiring promising talent from outside the game, many of the most profitable insights still come from independent researchers. If MLB made its data as inaccessible as tennis does, that trend would’ve ground to a halt long ago.
Nice as it is to dream about a better world of tennis data, we’re unlikely to see it anytime soon. Tennis doesn’t have a commissioner, so there’s no one to appoint a data czar, let alone anyone who could convince the alphabet soup of the ATP, WTA, ITF, IBM, SAP, and Hawkeye to aggregate their data in any meaningful way.
Until that happens, and until the data is publicly available, there will be no analytics revolution in tennis. We’ll continue to get what we have now: the occasional Hawkeye stat, free of context, illustrating the same sort of analysis we’ve been hearing for decades.