The End Goal of Tennis Analytics

I was about a thousand words into a meandering first draft when I realized that the ultimate goal of tennis analytics could be described simply:

The ability to quantify the impact of each individual shot, probably using camera-based player- and ball-tracking.

The purpose of every shot is to increase the odds of winning the point. (There are exceptions; you might hit a suboptimal shot to make a later shot harder to read or otherwise mislead your opponent.) Serves offer clear-cut examples. Carlos Alcaraz wins about 66% of his service points. If he hits an ace, that one shot increases his chances of winning the point from 66% to 100%, a swing of 34 percentage points. A missed first serve is worth -11 percentage points, as his chance of winning the point drops from 66% to 55%.

Shots that don’t end the point have a more modest effect than an ace, winner, or error. If you respond to a neutral forehand down the middle with a slightly more powerful forehand back down the middle, you might be upping your odds from 50% to 55%. If you run down a strong drop shot and just barely chip the ball back into play, your chances of winning the point might increase from 3% to 5%.

Point being: Each shot has some impact on the likely result of the point. If someone has, say, an above-average backhand, that will show up in these hypothetical numbers. Not every one of his backhands will move his single-point win probability in the right direction, but when we put them all together, we would be able to say that in a given match, his backhand was worth 1 point above average, or 2.5 points, or whatever else the sum of the individual impacts worked out to.

Shot-by-shot stats like these probably require camera-based ball and player tracking. We can come up with rough estimates using Match Charting Project data. (I’ve tried; results are mixed.) To get anything close to accurate measurements of win probability when a point is in progress, though, we need to know where each player is positioned as well as the progress of the ball.

Of course, the “end goal” of analytics differs depending on your own aims. If you are a player or coach, you want to know how to get better, or what tactics to use against your next opponent. These individual shot-impact stats wouldn’t identify mechanical flaws, but they would make it possible to isolate each individual type of shot at a very granular level–for instance, running backhands against left-handed forehands hit harder than 80 miles per hour. In terms of tactics, the benefits should be clear. The more detailed your understanding of an opponent’s strengths and weaknesses, the better your ability to tailor a game plan.

If you are a bettor, you are primarily concerned with predicting the future. A key component of that is to separate luck from skill. That’s the purpose of every sports stat with the word “expected” in it, from baseball’s xwOBA to soccer’s xG. Tennis doesn’t have much in the way of “x” stats because we generally don’t have access to underlying data that would allow an estimate of how many points each player “should” have won. Done correctly, “expected service points won” (call it xSPW) would be a better predictor of future results than the actual SPW we work with now.

Finally, if you’re a fan like me just hoping to better understand the game, these numbers would be a gold mine. Impressed by a “steal” when Andy Murray runs down a lob and hits a winner in return? How much better would it be to know exactly what his chances were of winning the point–and how it compared to his career bests? The next time Novak Djokovic and Daniil Medvedev slug out a 20-stroke rally, wouldn’t it be fascinating to know exactly who had the edge at each stage, and which shots shifted the momentum?

Now imagine those numbers for every steal, every momentum shift, every rally. We would learn so much about each player’s skills and tendencies, far beyond the few examples I’ve given so far.

The possibilities are endless. Having these numbers, especially if they became available in real time, would transform the way we talk about the game. Every time a baseball player hits a home run, we immediately find out the exit velocity and launch angle–measurements that tell you just how well it was hit. The more we can talk about the details of fundamental skills athletes are asked to execute, the better we understand just how well or poorly they are playing. Top-level results like set scores and match wins are lagging indicators, not leading ones.

I don’t know how, when, or even if the tennis-loving public will get stats like these. But I get excited just thinking about it.

* *

Subscribe to the blog to receive each new post by email:

 

These Press Conferences Don’t Matter

Naomi Osaka said that she isn’t going to talk to reporters at this year’s French Open. She implied that press conferences were damaging to mental health, and accepts that she’ll be fined, as is standard for players skipping post-match interviews. If this is the first you’re hearing the news and want a more in-depth treatment, here’s the New York Times article.

As my headline makes clear, these press conferences don’t really matter. Others do, and in a minute, I’ll explain why.

It’s often forgotten, but sports leagues and teams have a symbiotic relationship with the media. That’s why there are press conferences, as well as press boxes with workstations and free food. Not only is media coverage free publicity, it’s usually better publicity than the kind you can pay for. Sure, the US Open plasters advertising all over the NYC subway in August, but none of that compares to the publicity boost of daily tennis coverage in the New York Times or highlights shown on the evening news.

The biggest tournaments–such as Wimbledon, the US Open, and Roland Garros–still furnish those amenities, and they continue to make players available to the press. Inane and repetitive as those interview sessions sometimes are, they provide content that fills airtime and newspaper columns.

But the biggest events need not kowtow to the press. The majors are inherently newsworthy, and they almost always sell every ticket. If the French Open declared that no players would give press conferences during the upcoming fortnight, L’Equipe would still cover it. The slams have reached the status of a blue-chip corporation or a noxious politician–journalists might not want to cover them, but it’s part of the job.

Who’s bigger than what?

Many of the negative reactions to Osaka’s announcement center on the idea that she isn’t bigger than the sport–or if she is, that she shouldn’t act like it. After all, living legends such as Serena Williams and men’s Big Three have all given hundreds of press conferences.

For better or worse, Osaka–and a handful of other players–are bigger than the sport. But more importantly, the majors are bigger than the sport.

In a very long-term sense, maybe Osaka’s position will end up mattering. Maybe it will set a precedent that other players will follow; maybe the WTA will cave and not issue any fines; maybe journalists will ask even fewer tough questions (if that’s possible). But as far as the 2021 French Open is concerned, whether one star player answers media queries is irrelevant.

The same cannot be said about virtually every other event on the calendar. With the possible exception of Indian Wells and a couple of marquee tour stops in Europe, tournaments aren’t entrenched in the public consciousness, and they scuffle anew for sponsors, spectators, and press coverage every year. If Osaka were the headliner in, say, San Jose this summer, it would be a huge blow if she refused to talk to the press.

I suspect that Osaka knows this and will act accordingly. I could be wrong: perhaps her French Open decision is a trial balloon, and if the backlash is minor, she’ll never do a tournament press conference again. But more likely, she realizes that the stakes aren’t that high, and media outlets will manage just fine for the next two weeks without her. Even though she’s the highest-paid female athlete in the world, the tournament itself is a bigger star than she is.

Rethinking the Mental Game

Italian translation at settesei.it

Everyone seems to agree that a huge part of tennis is mental. It’s less clear exactly what that means. Pundits and fans often say that certain players are mentally strong or mentally weak, attributes that help explain the gap when there’s a mismatch between talent and results.

Here are three more adjectives you’ll hear in ‘mental game’ discussions: clutch, streaky, consistent. I’ve frequently railed against commentators’ overuse of these terms. For instance, hitting an ace facing break point is ‘clutch,’ in the sense that the player executed well in a key moment. But that doesn’t mean the player himself can be described as clutch. Just because he sometimes performs well under pressure doesn’t mean he does so any more than the average player. Same goes for ‘streaky’–humans tend to overgeneralize from small samples, so if you see a player hit three down-the-line backhand winners in a row, you’ll probably think it’s a hot streak, even though such a sequence will occasionally arise by luck alone.

Some players probably are more or less clutch, more or less streaky, or more or less consistent than their peers, even beyond what can be explained by chance. At the same time, no tour pro is so much more or less clutch that their high-leverage performance explains a substantial part of their success or failure on tour. Most players win about as many tiebreaks as you’d expect based on their non-tiebreak records and convert about as many break points as you’d predict based on their overall return stats. Nothing magical happens in these most-commonly cited pressure situations, and no player becomes either superhuman or completely hopeless.

If you’re reading my blog, you’ve probably heard most of this before, either from me or from innumerable other sports analysts. I’m not taking the extreme position that there is no clutch (or streakiness or consistency), but I am pointing out that these effects are small–so small that we are unlikely to notice them just by watching matches, and sometimes so tiny that even analysts find it difficult to differentiate them from pure randomness.

Still, we’re left with the unanimous–and appealing!–belief that tennis is a mental game. In trying to explain various simplified models, I’ll often say something like, “this is what it would look like if players were robots.” Even though some of those models are rather accurate, I think we can all agree that players aren’t robots, Milos Raonic notwithstanding.

Completely mental

An extreme version of the ‘mental game’ position is one I’ve heard attributed to James Blake, that the difference between #1 and #100 is all mental. (I’m guessing that’s an oversimplification of what Blake thinks, but I’ve heard similar opinions often enough that the general idea is worth considering.) That’s a bit hard to stomach–does anybody think that Radu Albot (the current No. 99) is as talented as Rafael Nadal? But once we backtrack a little bit from the most extreme position, we can see its appeal. At the moment, both Bernard Tomic and Ernests Gulbis are ranked between 80 and 100. Can you say with confidence that those guys aren’t as talented as top-tenners Kevin Anderson or Marin Cilic? Yet Tomic often excels in pressure situations, and Cilic is the one known to crumble.

The problem with Tomic, Gulbis, and so many of the innumerable underachievers in the history of sport, isn’t that they fall apart when the stakes are high. We can all remember matches–or sets, or other long stretches of play–in which a player seems uninterested, unmotivated, or just low-energy for no apparent reason. Even accounting for selection bias, I think the underachievers are more likely to provide these inexplicably mediocre performances. (Can you imagine Nadal appearing unmotivated? Or Maria Sharapova?) In a very broad sense, I could be talking about streakiness or consistency here, but I don’t think it’s what people usually mean by those two terms. It operates at a larger scale–an entire set of mediocrity instead of say, three double faults in a single game–and it offers us a new way of thinking about the mental aspect of tennis.

Focus

Let’s call this new variable focus. There are millions of potential distractions, internal and external, that stand in the way of peak performance. The more a player is able to ignore, disregard, or somehow overcome those distractions, the more focused she is.

Imagine that every player has her own maximum sustainable ability level, and on a scale of 1 to 10, that’s a 10. (I’m saying ‘sustainable’ to make it clear that we’re not talking about ninja Radwanska behind-the-back drop-volley stuff, but the best level that a player can keep up. Nadal’s 10 is different from Albot’s 10.) A rating of 1, at the bottom of the scale, is something we rarely see from the pros–imagine Guillermo Coria or Elena Dementieva getting serve yips. The more focused the player, the more often she’s performing at a 10 and, while she may not be able to sustain that, the more focused player remains closer to a 10 more of the time.

This idea of ‘focus’ sounds a lot like the old notion of ‘consistency’, and maybe it’s what people really mean when they call a player consistent. But there are several reasons why I think it’s important to move away from ‘consistency.’ The first one is pedantic: ‘consistent’ isn’t necessarily good. If you tell a player to be consistent and she hits nothing but unforced errors on her forehand, she has followed your directions by being consistently bad. More seriously, ‘consistency’ is often conflated with ‘low-risk’, which is a strategy, not a positive or negative trait. A player like Petra Kvitova will never be consistent–her signature level of aggression will always result in plenty of errors, sometimes ugly ones, and occasionally in ill-timed bunches. Even an optimized strategy for a highly-focused Kvitova will appear to be inconsistent.

If you’re the type of person who thinks a lot about tennis, you probably see the limitations in my definition of consistency. I agree: The concept I’ve knocked down is a bit of a strawman. If I could do a better job of consisely defining what tennis people talk about when they talk about consistency, I would–again, part of the problem is that the term is overloaded. Even if you mean ‘focus’ when you’re saying ‘consistency,’ I think it’s valuable to use a separate term with less baggage.

Chess

Is ‘focus’ any better than the other mental-game concepts I’ve knocked down? We can objectively measure clutch effects, but it’s a lot harder to look at the data from a match or an entire season and quantify a player’s level of focus.

Nonetheless, I strongly suspect that at the elite level, focus varies more than, say, micro-level streakiness. Put another way: The difference in focus among top players has the potential to explain much of their difference in performance.

I started to think about the importance of focus–again, the ability to sustain a peak or near-peak level for long periods of time–while following last month’s World Chess Championship between Magnus Carlsen and Fabiano Caruana. (I wrote about the chess match here.) Chess is very different from tennis, of course. But because it doesn’t rely on physical strength, speed, or agility at all, it has a much stronger claim to the ‘mental game’ moniker than tennis does. While flashes of brilliance have their place in chess, classical games require sustained concentration at a level that few of us can even fathom. One blunder against an elite player, and you might as well give up and get some extra rest before the next game.

A common stereotype of a chess grandmaster is an old man, whose decades of knowledge and savvy help him brush aside younger upstarts. Yet Carlsen and Caruana, the two best chess players in the world, are in their mid-20s. The current top 30 includes only four men born before 1980. 12 of the top 30 were born in the 1990s, two of them since 1998. The age distribution in elite chess is awfully similar to that of elite tennis.

The aging curve in tennis lends itself to easy explanations: Players can start reaching the top when they hit physical maturity in their late teens, they continue to improve throughout their 20s as they gain experience and enjoy the benefits of physical youth, and then physical deterioration creeps in, beginning to have an effect in the late 20s or early 30s and increasing in severity over time. There’s obviously some truth in that. No matter how important the mental aspect of tennis, it’s hard to compete once you’ve lost a step, and even harder with chronic back or knee pain.

Yet the chess analogy persists: If tennis were mental, with much of the variation between elites explained by focus, the aging curve would look about the same. As modern science has improved training, nutrition, and injury recovery–thus reducing the effect of physical deterioration–tennis’s aging curve has developed a flatter plateau in the late 20s and 30s. In other words, as physical risks are mitigated, the elite career trajectory of tennis looks even more like that of chess.

Thinking ahead

For now, this is just a theory. Maybe you agree with me that it’s a very appealing one, but it remains untested, and it’s possibly very difficult to test at all.

If sustained focus is such a key factor in elite tennis performance, how would we even identify it? The most direct way would be to avoid the tennis court altogether and devise experiments so that we could measure the concentration of top players. I doubt we could convince the ATP top 100 to join us in the lab for a fun day of testing. There is some long-term potential, though, as national federations could do just that with their rising stars. Some might be doing so already; some professional baseball and American football teams administer cognitive tests to potential signees as well.

Unfortunately, we can’t make the best tennis players in the world our guinea pigs. If we looked instead at match-level results, we could try to measure focus using a similar approach to what I’ve done before in the name of quantifying consistency (oops!). My earlier algorithm attempted to measure the predictability of a player’s results–that is, is the 11th best player usually losing to the top ten and beating everyone else, or are his results less predictable? That’s not what we’re interested in here, because by that definition, ‘consistency’ isn’t necessarily good.

We could work along similar lines, though. Given a year or more or results, we could estimate a player’s peak level, perhaps by taking the average of his five best results. (His absolute best result might be the result of an injured opponent, an untimely rain delay, or something else unusual.) That would indicate the level that marks a ’10’ on his personal scale of 1 to 10. Then, compare his other results to that peak. If most of his results are close to that level–like the ‘consistent’ player who loses to the top ten and beats everyone else–he appears to be focused, at least from one match to the next. If he has a lot of bad losses by comparison, he is failing to sustain a level we know he’s capable of.

That sort of approach isn’t entirely satisfying, as is often the case when working with match-level stats. Perhaps with shot-level or camera-based data, we could do even better. Using a similar approach to the above–define a peak, compare other performances to that peak–we could look at serve speed or effectiveness, putting returns in play, converting opportunities at net, and so on. It would be complicated, in part because opponent quality and surface speed always have the potential to impact those numbers, but I think it’s worth pursuing.

If I’m right about this–that tennis isn’t just a mental game, it’s a game heavily influenced by sustained concentration–the long term impact is on player development. Academies and coaches already spend plenty of time off court, talking tactics and utilizing insights from psychology. This would be a further step in that direction.

The mental side of tennis–and sports in general–remains a huge mess of unknowns. As the next generation of elite players tries to develop small technical and tactical improvements in order to find an edge, perhaps the mental side is the next frontier, one that would finally enable a new generation to sweep away the old.

A Preface to All GOAT Arguments

Italian translation at settesei.it

Earlier this week, The Economist published my piece about Rafael Nadal’s and Roger Federer’s grand slam counts. I made the case that, because Nadal’s paths to major titles had been more difficult (the 2017 US Open notwithstanding), his 16 slams are worth more–barely!–than Federer’s.

Inevitably, some readers reduced my conclusion to something like, “stats prove that Nadal is the greatest ever.” Whoa there, kiddos. It may be true that Nadal is better than Federer, and we could probably make a solid argument based on the stats. But a rating of 18.8 to 18.7, based on 35 tournaments, can’t quite carry that burden.

There are two major steps in settling any “greatest ever” debate (tennis or otherwise). The first is definitional. What do we mean by “greatest?” How much more important are slams than non-slams? What about longevity? Rankings? Accomplishments across different surfaces? How much weight do we give a player’s peak? How much does the level of competition matter? What about head-to-head records? I could go on and on. Only when we decide what “greatest” means can we even attempt to make an argument for one player over another.

The second step–answering the questions posed by the first–is more work-intensive, but much less open to debate. If we decide that the greatest male tennis player of all time is the one who achieved the highest Elo rating at his peak, we can do the math. (It’s Novak Djokovic.) If you pick out ten questions that are plausible proxies for “who’s the greatest?” you won’t always get the same answer. Longevity-focused variations tend to give you Federer. (Or Jimmy Connors.) Questions based solely on peak-level accomplishments will net Djokovic (or maybe Bjorn Borg). Much of the territory in between is owned by Nadal, unless you consider the amateur era, in which case Rod Laver takes a bite out of Rafa’s share.

Of course, many fans skip straight to the third step–basking in the reflected glory of their hero–and work backwards. With a firm belief that their favorite player is the GOAT, they decide that the most relevant questions are the ones that crown their man. This approach fuels plenty of online debates, but it’s not quite at my desired level of rigor.

When the big three have all retired, someone could probably write an entire book laying out all the ways we might determine “greatest” and working out who, by the various definitions, comes out on top. Most of what we’re doing now is simply contributing sections of chapters to that eventual project. Now or then, one blog post will never be enough to settle a debate of this magnitude.

In the meantime, we can aim to shed more light on the comparisons we’re already making. Grand slam titles aren’t everything, but they are important, and “19 is more than 16” is a key weapon in the arsenal of Federer partisans. Establishing that this particular 19 isn’t really any better than that particular 16 doesn’t end the debate any more than “19 is more than 16” ever did. But I hope that it made us a little more knowledgeable about the sport and the feats of its greatest competitors.

At the one-article, 1,000-word scale, we can achieve a lot of interesting things. But for an issue this wide-ranging, we can’t hope to settle it in one fell swoop. The answers are hard to find, and choosing the right question is even more difficult.

 

Little Data, Big Potential

This is a guest post by Carl Bialik.

I had more data on my last 30 minutes of playing tennis than I’d gotten in my first 10 years of playing tennis  — and it just made me want so much more.

Ben Rothenberg and I had just played four supertiebreakers, after 10 minutes of warmup and before a forehand drill. And for most of that time — all but a brief break while PlaySight staff showed the WTA’s Micky Lawler the system — 10 PlaySight cameras were recording our every move and every shot: speed, spin, trajectory and whether it landed in or out. Immediately after every point, we could walk over to the kiosk right next to the net to watch video replays and get our stats. The tennis sure didn’t look professional-grade, but the stats did: spin rate, net clearance, winners, unforced errors, net points won.

Later that night, we could go online and watch and laugh with friends and family. If you’re as good as Ben and I are, laugh you will: As bad as we knew the tennis was by glancing over to Dominic Thiem and Jordan Thompson on the next practice court, it was so much worse when viewed on video, from the kind of camera angle that usually yields footage of uberfit tennis-playing pros, not uberslow tennis-writing bros.

https://www.youtube.com/watch?v=xJ7AUcNVPoM

This wasn’t the first time I’d seen video evidence of my take on tennis, an affront to aesthetes everyone. Though my first decade and a half of awkward swings and shoddy footwork went thankfully unrecorded, in the last five years I’d started to quantify my tennis self. First there was the time my friend Alex, a techie, mounted a camera on a smartphone during our match in a London park. Then in Paris a few years later, I roped him into joining me for a test of Mojjo, a PlaySight competitor that used just one camera — enough to record video later published online, with our consent and to our shame. Last year, Tennis Abstract proprietor Jeff Sackmann and I demo-ed a PlaySight court with Gordon Uehling, founder of the company.

With PlaySight and Mojjo still only in a handful of courts available to civilians, that probably puts me — and Alex, Jeff and Ben — in the top 5 or 10 percent of players at our level for access to advanced data on our games. (Jeff may object to being included in this playing level, but our USPS Tennis Abstract Head2Head suggests he belongs.) So as a member of the upper echelon of stats-aware casual players, what’s left once I’m done geeking out on the video replays and RPM stats? What actionable information is there about how I should change my game?

Little data, modest lessons

After reviewing my footage and data, I’m still looking for answers. Just a little bit of tennis data isn’t much more useful than none.

Take the serve, the most common shot in tennis. In any one set, I might hit a few dozen. But what can I learn from them? Half are to the deuce court, and half are to the ad court. And almost half of the ones that land in are second serves. Even with my limited repertoire, some are flat while others have slice. Some are out wide, some down the T and some to the body — usually, for me, a euphemism for, I missed my target.

Playsight groundstroke report

If I hit only five slice first serves out wide to the deuce court, three went in, one was unreturned and I won one of the two ensuing rallies, what the hell does that mean? It doesn’t tell me a whole lot about what would’ve happened if I’d gotten a chance to I try that serve once more that day against Ben — let alone what would happen the next time we played, when he had his own racquet, when we weren’t hitting alongside pros and in front of confused fans, with different balls on a different surface without the desert sun above us, at a different time of day when we’re in different frames of mind. And the data says even less about how that serve would have done against a different opponent.

That’s the serve, a shot I’ll hit at least once on about half of points in any match. The story’s even tougher for rarer shots, like a backhand drop half volley or a forehand crosscourt defensive lob, shots so rare they might come up once or twice every 10 matches.

More eyes on the court

It’s cool to know that my spinniest forehand had 1,010 RPM (I hit pretty flat compared to Jack Sock’s 3,337 rpm), but the real value I see is in the kind of data collected on that London court: the video. PlaySight doesn’t yet know enough about me to know that my footwork was sloppier than usual on that forehand, but I do, and it’s a good reminder to get moving quickly and take small steps. And if I were focusing on the ball and my own feet, I might have missed that Ben leans to his backhand side instead of truly split-stepping, but if I catch him on video I can use that tendency to attack his forehand side next time.

Playsight video with shot stats

Video is especially useful for players who are most focused on technique. As you might have gathered, I’m not, but I can still get tactical edge from studying patterns that PlaySight doesn’t yet identify.

Where PlaySight and its ilk could really drive breakthroughs is by combining all of the data at its disposal. The company’s software knows about only one of the thousands of hours I’ve spent playing tennis in the last five years. But it has tens of thousands of hours of tennis in its database. Even a player as idiosyncratic as me should have a doppelganger or two in a data set that big. And some of them must’ve faced an opponent like Ben. Then there are partial doppelgangers: women who serve like me even though all of our other shots are different; or juniors whose backhands resemble mine (and hopefully are being coached into a new one).  Start grouping those videos together — I’m thinking of machine learning, clustering and classifying — and you can start building a sample of some meaningful size. PlaySight is already thinking this way, looking to add features that can tell a player, say, “Your backhand percentage in matches is 11 percent below other PlaySight users of a similar age/ability,” according to Jeff Angus, marketing manager for the company, who ran the demo for Ben and me.

The hardware side of PlaySight is tricky. It needs to install cameras and kiosks, weatherproofing them when the court is outdoors, and protect them from human error and carelessness. It’s in a handful of clubs, and the number probably won’t expand much: The company is focusing more on the college game. Even when Alex and I, two players at the very center of PlaySight’s target audience among casual players, happened to book a PlaySight court recently in San Francisco, we decided it wasn’t worth the few minutes it would have taken at the kiosk to register — or, in my case, remember my password. The cameras stood watch, but the footage was forever lost.

Bigger data, big questions

I’m more excited by PlaySight’s software side. I probably will never play enough points on PlaySight courts for the company to tell me how to play better or smarter — unless I pay to install the system at my home courts. But if it gets cheaper and easier to collect decent video of my own matches — really a matter of a decent mount and protector for a smartphone and enough storage space — why couldn’t I upload my video to the company? And why couldn’t it find video of enough Bizarro Carls and Bizarro Carl opponents around the world to make a decent guess about where I should be hitting forehands?

There are bigger, deeper tennis mysteries waiting to be solved. As memorably argued by John McPhee in Levels of the Game, tennis isn’t so much as one sport as dozens of different ones, each a different level of play united only by common rules and equipment. And a match between two players even from adjacent levels in his hierarchy typically is a rout. Yet tactically my matches aren’t so different from the ones I see on TV, or even from the practice set played by Thiem and Thompson a few feet from us. Hit to the backhand, disguise your shots, attack short balls and approach the net, hit drop shots if your opponent is playing too far back. And always, make your first serve and get your returns in.

So can a tactic from one level of the game even to one much lower? I’m no Radwanska and Ben is no Cibulkova, but could our class of play share enough similarity — mathematically, is Carl : Ben :: Aga : Pome — that what works for the pros works for me? If so, then medium-sized data on my style is just a subset of big data from analogous styles at every level of the game, and I might even find out if that backhand drop half volley is a good idea. (Probably not.)

PlaySight was the prompt, but it’s not the company’s job to fulfill product features only I care about. It doesn’t have to be PlaySight. Maybe it’s Mojjo, maybe Cizr. Or maybe it’s some college student who likes tennis and is looking for a machine-learning class. Hawk-Eye, the higher-tech, higher-priced, older competitor to PlaySight, has been slow to share its data with researchers and journalists. If PlaySight has figured out that most coaches value the video and don’t care much for stats, why not release the raw footage and stats to researchers, anonymized, who might get cracking on the tennis classification question or any of a dozen other tennis analysis questions I’ve never thought to ask? (Here’s a list of some Jeff and I have brainstormed, and here are his six big ones.) I hear all the time from people who like tennis and data and want to marry the two, not for money but to practice, to learn, to discover, and to share their findings. And other than what Jeff’s made available on GitHub, there’s not much data to share. (Just the other week, an MIT grad asked for tennis data to start analyzing.)

Sharing data with outside researchers “isn’t currently in the road map for our product team, but that could change,” Angus said, if sharing data can help the company make its data “actionable” for users to improve to their games.

Maybe there aren’t enough rec players who’d want the data with enough cash to make such ventures worthwhile. But college teams could use every edge. Rising juniors have the most plastic games and the biggest upside. And where a few inches can change a pro career, surely some of the top women and men could also benefit from PlaySight-driven insights.

Yet even the multimillionaire ruling class of the sport is subject to the same limitations driven by the fractured nature of the sport: Each event has its own data and own systems. Even at Indian Wells, where Hawk-Eye exists on every match court, just two practice courts have PlaySight; the company was hoping to install four more for this year’s tournament and is still aiming to install them soon. Realistically, unless pros pay to install PlaySight on their own practice courts and play lots of practice matches there, few will get enough data to be actionable. But if PlaySight, Hawk-Eye or a rival can make sense of all the collective video out there, maybe the most tactical players can turn smarts and stats into competitive advantages on par with big serves and wicked topspin forehands.

PlaySight has already done lots of cool stuff with its tennis data, but the real analytics breakthroughs in the sport are ahead of us.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.

The Five Big Questions in Tennis Analytics

Italian translation at settesei.it

The fledgling field of tennis analytics can seem rather chaotic, with scores of mini-studies that don’t fit together in any obvious way. Some seem important but unfinished while others are entertaining but trivial.

Let me try to impose some structure on this project by classifying research topics into what I’ll call the Five Big Questions, each of which is really just an umbrella for hundreds more (like these). As we’ll see, there are really six categories, not five, which just goes to show: analytics is about more than just counting.

1. What’s the long-term forecast?

Beyond the realm of the next few tournaments, what does the evidence tell us about the future? This question encompasses everything from seasons to entire careers. What are the odds that Roger Federer reclaims the No. 1 ranking? How many Grand Slams will Nick Kyrgios win? How soon will Catherine Bellis crack the top ten?

The most important questions in this category are the hardest ones to answer: Given the limited data we have on junior players, what can we predict–and with what level of confidence–about their future? These are questions that national federations would love to answer, but they are far from the only stakeholders. Everyone from sponsors to tournaments to the players’ families themselves have an interest in picking future stars. Further, the better we can answer these questions, the more prepared we can be for the natural follow-ups. What can we (as families, coaches, federations, etc.) do to improve the odds that a player succeeds?

2. Who will win the next match?

The second question is also concerned with forecasting, and it is the subject that has received–by far–the most analytical attention. Not only is it fun and engaging to try to pick winners, there’s an enormous global industry with billions of dollars at stake trying to make more accurate forecasts.

As an analyst, I’m not terribly interested in picking winners for the sake of picking winners. More valuable is the quest to identify all of the factors that influence match outcomes, like the role of fatigue, or a player’s preference for certain conditions, or the specifics of a given matchup. Player rating systems fall into this category, and it’s important to remember they are only a tool for forecasting, not an end to themselves.

As a meta-question in this category, one might ask how accurate a set of forecasts could possibly become. Or, posed differently, how big of a role does chance play in match outcomes?

3. When and why does the i.i.d. model break down?

A lot of sports analysis depends on the assumption that events are “identically and independently distributed”–i.e. factors like streakiness, momentum, and clutch are either nonexistent or impossible to measure. In tennis terms, the i.i.d. model might assume that a player converts break points at the same rate that she wins all ad-court points, or that a player hold serve while serving for the set just as often as he holds serve in general.

The conventional wisdom strongly disagrees, but it is rarely consistent. (“It’s hard to serve for the set” but “this player is particularly good when leading.”) This boils down to yet another set of forecasting questions. We might know that a player wins 65% of service points, but what are her chances of winning this point, given the context?

I suspect that thorough analysis will reveal plenty of small discrepancies between reality and the i.i.d. model, especially at the level of individual players. More than with the first two topics, the limited sample sizes for many specific contexts mean we must always be careful to distinguish actual effects from noise and look for long-term trends.

4. How good is that shot?

As more tennis data becomes available in a variety of formats, the focus of tennis analytics will become more granular. The Match Charting Project offers more than 3,000 matches worth of shot-by-shot logs. Even without the details of each shot–like court position, speed, and spin–we can start measuring the effectiveness of specific players’ shots, such as Federer’s backhand.

With more granular data on every shot, analysts will be able to be even more precise. Eventually we may know the effect of adding five miles per hour to your average forehand speed, or the value of hitting a shot from just inside the baseline instead of just behind. Some academics–notably Stephanie Kovalchik–have begun digging into this sort of data, and the future of this subfield will depend a great deal on whether these datasets ever become available to the public.

5. How effective is that tactic?

Analyzing a single shot has its limits. Aside from the serve, every shot in tennis has a context–and even serves usually form part of the backdrop for other shots. Many of the most basic tactical questions have yet to be quantified, such as the success rate of approaching to the backhand instead of the forehand.

As with the previous topic, the questions about tactics get a lot more interesting–and immensely more complicated–as soon as Hawkeye-type data is available. With enough location, speed, and spin data, we’ll be able to measure the positions from which approach shots are most successful, and the type (and direction) that is most effective from each position. We could quantify the costs and benefits of running around a forehand: How good does the forehand have to be to counteract the weaker court position that results?

We can scrape the surface of this subject with the Match Charting Project, but ultimately, this territory belongs to those with camera tracking data.

6. What is the ideal structure of the sport?

Like I said, there are really just five questions. Forecasting careers, matches, and points, and quantifying shots and tactics encompass, for me, the entire range of “tennis analytics.”

However, there are plenty of tennis-related questions that we might assign to the larger field of “business of sports.” How should prize money be distributed? What is the best way to structure the tour to balance the interests of veterans and newcomers? Are there too many top-level tournaments, or too few? What the hell should we do with Davis Cup, anyway?

Many of these issues are–for now–philosophical questions that boil down to preferences and gut instincts. Controlled experiments will always be difficult if only because of the time frames involved: If we change the Davis Cup format and it loses popularity, is it causation or just correlation? We can’t replicate the experiment. But despite the challenges, these are major questions, and analysts may be able to offer valuable insights.

Now … let’s get to work.

The Continuum of Errors

Italian translation at settesei.it

When is an error unforced? If you envision designing an algorithm to answer that question, it quickly becomes unmanageable. You’d need to take into account player position, shot velocity, angle, and spin, surface speed, and perhaps more. Many errors are obviously forced or unforced, but plenty fall into an ambiguous middle ground.

Most of the unforced error counts we see these days–via broadcasts or in post-match recaps–are counted by hand. A scorer is given some guidance, and he or she tallies each kind of error. If the human-scoring algorithm is boiled down to a single rule, it’s something like: “Would a typical pro be expected to make that shot?” Some scorers limit the number of unforced errors by always counting serve returns, or net shots, or attempted passing shots, as forced.

Of course, any attempt to sort missed shots into only two buckets is a gross oversimplification. I don’t think this is a radical viewpoint. Many tennis commentators acknowledge this when they explain that a player’s unforced error count “doesn’t tell the whole story,” or something to that effect. In the past, I’ve written about the limitations of the frequently-cited winner-to-unforced error ratio, and the similarity between unforced errors and the rightly-maligned fielding errors stat in baseball.

Imagine for a moment that we have better data to work with–say, Hawkeye data that isn’t locked in silos–and we can sketch out an improved way of looking at errors.

First, instead of classifying only errors, it’s more instructive to sort potential shots into three categories: shots returned in play, errors (which we can further distinguish later on), and opponent winners. In other words: Did you make it, did you miss it, or did you fail to even get a racket on it? One man’s forced error is another man’s ball put back in play*, so we need to consider the full range of possible outcomes from each potential shot.

*especially if the first man is Bernard Tomic and the other man is Andy Murray.

The key to gaining insight from tennis statistics is increasing the amount of context available–for instance, taking a player’s stats from today and comparing them to the typical performance of a tour player, or contrasting them with how he or she played in the last similar matchup. Errors are no different.

Here’s a basic example. In the sixth game of Angelique Kerber‘s match in Sydney this week against Darya Kasatkina, she hit a down-the-line forehand:

Kerber hits a down-the-line forehand

Thanks to the Match Charting Project, we have data for about 350 of Kerber’s down-the-line forehands, so we know it goes for a winner 25% of the time, and her opponent hits a forced error another 9% of the time. Say that a further 11% turn into unforced errors, and we have a profile for what usually happens when Kerber goes down the line: 25% winners, 20% errors, 55% put back in play. We might dig even deeper and establish that the 55% put back in play consists of 30% that ultimately resulted in Kerber winning the point against 25% that she eventually lost.

In this case, Kasatkina was able to get a racket on the ball, but missed the shot, resulting in what most scorers would agree was a forced error:

Kasatkina lunges for the return

This single instance–Kasatkina hitting a forced error against a very effective type of offensive shot–doesn’t tell us anything on its own. Imagine, though, that we tracked several players in 100 attempts each to reply to a Kerber down-the-line forehand. We might discover that Kasatkina lets 35 of 100 go for winners, or that Simona Halep lets only 15 go for winners and gets 70 back in play, or that Anastasia Pavlyuchenkova hits an error on 30 of the 100 attempts.

My point is this: With more granular data, we can put errors in a real-life context. Instead of making a judgment about the difficulty of a certain shot (or relying on a scorer to do so), it’s feasible to let an algorithm do the work on 100 shots, telling us whether a player is getting to more balls than the average player, or making more errors than she usually does.

The continuum, and the future

In the example outlined above, there’s a lot of important details that I didn’t mention. In comparing Kasatkina’s error to a few hundred other down-the-line Kerber forehands, we don’t know whether the shot was harder than usual, whether it was placed more accurately in the corner, whether Kasatkina was in better position than Kerber’s typical opponent on that type of shot, or the speed of the surface. Over the course of 100 down-the-line forehands, those factors would probably even out. But in Tuesday’s match, Kerber hit only 18 of them. While a typical best-of-three match will give us a few hundred shots to work with, this level of analysis can only tell us so much about specific shots.

The ideal error-classifying algorithm of the future would do much better. It would take all of the variables I’ve mentioned (and more, undoubtedly) and, for any shot, calculate the likelihood of different outcomes. At the moment of the first image above, when the ball has just come off of Kerber’s racket, with Kasatkina on the wrong half of the baseline, we might estimate that there is a 35% chance of a winner, a 25% chance of an error, and a 40% chance that ball is returned in play. Depending on the type of analysis we’re doing, we could calculate those numbers for the average WTA player, or for Kasatkina herself.

Those estimates would allow us, in effect, to “rate” errors. In this example, the algorithm gives Kasatkina only a 40% chance of getting the ball back in play. By contrast, an average rallying shot probably has a 90% chance of ending up back in play. Instead of placing errors in buckets of “forced” and “unforced,” we could draw lines wherever we wish, perhaps separating potential shots into quintiles. We would be able to quantify whether, for instance, Andy Murray gets more of the most unreturnable shots back in play than Novak Djokovic does. Even if we have an intuition about that already, we can’t even begin to prove it until we’ve established precisely what that “unreturnable” quintile (or quartile, or whatever) consists of.

This sort of analysis would be engaging even for those fans who never look at aggregate stats. Imagine if a broadcaster could point to a specific shot and say that Murray had only a 2% chance of putting it back in play. In topsy-turvy rallies, this approach could generate a win probability graph for a single point, an image that could encapsulate just how hard a player worked to come back from the brink.

Fortunately, the technology to accomplish this is already here. Researchers with access to subsets of Hawkeye data have begun drilling down to the factors that influence things like shot selection. Playsight’s “SmartCourts” classify errors into forced and unforced in close to real time, suggesting that there is something much more sophisticated running in the background, even if its AI occasionally makes clunky mistakes. Another possible route is applying existing machine learning algorithms to large quantities of match video, letting the algorithms work out for themselves which factors best predict winners, errors, and other shot outcomes.

Someday, tennis fans will look back on the early 21st century and marvel at just how little we knew about the sport back then.

All the Answers

Italian translation at settesei.it

At the end of Turing’s Cathedral, George Dyson suggests that while computers aren’t always able to usefully respond to our questions, they are able to generate a stunning, unprecedented array of answers–even if the corresponding questions have never been asked.

Think of a search engine: It has indexed every possible word and phrase, in many cases still waiting for the first user to search for it.

Tennis Abstract is no different. Using the menus on the left-hand side of Roger Federer’s page–even ignoring the filters for head-to-heads, tournaments, countries, matchstats, and custom settings like those for date and rank–you can run five trillion different queries. That’s twelve zeroes–and that’s just Federer. Judging by my traffic numbers, it will be a bit longer before all of those have been tried.

Every filter is there for a reason–an attempt to answer some meaningful question about a player. But the vast majority of those five trillion queries settle debates that no one in their right mind would ever have, like Roger’s 2010 hard-court Masters record when winning a set 6-1 against a player outside the top 10. (He was 2-0.)

The danger in having all these answers is that it can be tempting to pretend we were asking the questions–or worse, that we were asking the questions and suspected all along that the answers would turn out this way.

The Hawkeye data on tennis broadcasts is a great example. When a graphic shows us the trajectory of several serves, or the path of the ball over every shot of a rally, we’re looking at an enormous amount of raw data, more than most of us could comprehend if it weren’t presented against the familiar backdrop of a tennis court. Given all those answers, our first instinct is too often to seek evidence for something we were already pretty sure about–that Jack Sock’s topspin is doing the damage, or Rafael Nadal’s second serve is attackable.

It’s tough to argue with those kind of claims, especially when a high-tech graphic appears to serve as confirmation. But while those graphics (or those results of long-tail Tennis Abstract queries) are “answers,” they address only narrow questions, rarely proving the points we pretend they do.

These narrow answers are merely jumping-off points for meaningful questions. Instead of looking at a breakdown of Novak Djokovic’s backhands over the course of a match and declaring, “I knew it, his down-the-line backhand is the best in the game,” we should realize we’re looking at a small sample, devoid of context, and take the opportunity to ask, “Is his down-the-line backhand always this good?” or “How does his down-the-line backhand compare to others?” Or even, “How much does a down-the-line backhand increase a player’s odds of winning a rally?”

Unfortunately, the discussion usually stops before a meaningful question is ever asked. Even without publicly released Hawkeye data, we’re beginning to have the necessary data to research many of these questions.

As much as we love to complain about the dearth of tennis analytics, too many people draw conclusions from the pseudo-answers of fancy graphics. With more data available to us than ever before, it is a shame to mistake narrow, facile answers for broad, meaningful ones.

The Pervasive Role of Luck in Tennis

Italian translation at settesei.it

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.

Toward Atomic Statistics

Italian translation at settesei.it

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.