The Untapped Potential of Umpire Scorecard Data

There’s a lot more that can be done with tennis data. Everyone knows this. Even the ATP and WTA tours–along with their rather prominent partners–know this.

Both tours are sitting on a mountain of information that they’ve barely exploited: umpire scorecard data. It’s not cutting edge–there are no cameras, no courtside loggers counting unforced errors and winners. It’s just a log of every point, along with first or second serve, aces, and double faults. Despite those limits, there are many untapped advantages.

First: There’s an umpire scorecard for every match. Not every match on a TV court, not every match on a Hawkeye court, not every main draw match.  Every match. If a ATP, WTA, or ITF umpire is officiating the match, to the best of my knowledge, there is a scorecard–when you see a chair umpire tap on a screen, this is what they’re recording. That means data on thousands of matches and players every year, from Novak Djokovic to Djordje Djokovic.

It’s tough to overstate how valuable that is. The main drawback of most tennis stats is context. For instance, when Hawkeye puts a graphic on your TV screen, it’s often based on data from a single match or the present tournament. IBM’s much-publicized analytics are based on Grand Slam matches only. Umpire scorecards have no such problem.

Second, there’s a ton of information lurking in this low-tech tracking system. The basics of first and second serves, aces, and double faults may not sound like much, but as we’ll see below, they open the door to a huge array of stats. ATP and WTA “Match Stats” are compiled from these scorecards, but they only scratch the surface.

How to do more with scorecards

In a minute, I’ll make specific suggestions for additional totals and rates that the tours could compile from the data they already have. Before that, let me explain why simply expanding the contents of “Match Stats” should be Plan B.

More and more journalism is data-based, and more and more avid fans are, to some extent or other, analyzing tennis for publication. In other words, there is a rapidly growing base of analysts who don’t need data pre-packaged for them. Every match is different, and the numbers needed to illustrate any match report are different as well. For broader analysis, like comparing players over the course of a season, the need for customized data is more important still.

So: Release the point-by-point data from the scorecards.

Another benefit of the simplicity of umpire scorecard data is that more analysts can easily manage it. No organization could foresee everything that might be interesting about a match, so why even try? Not every journalist will want to dig into a point-by-point spreadsheet to see how often Julien Benneteau missed his first serve of a game, or how Rafael Nadal responded every time he fell behind 0-30. But some will do just that. When they do, their work benefits, their readers have more ways to engage with the next match they watch, and the sport ultimately wins.

A not-so-brief wish list

I have a sneaking feeling that no one’s going to release point-by-point data for every ATP or WTA match. I hope that’s not the case, but if it is, the tours should still consider vastly expanding the stats they compile for each match–including past matches for as far back as their databases go.

  1. Deuce/ad comparisons. Some players serve much more effectively in one than the other. For all deuce-court service points, I would like: (a) total points, (b) aces, (c) double faults, and (d) first serves in. Same for ad-court service points.
  2. Break point stats. Same as the above: For both servers facing break point: (a) aces, (b) double faults, and (c) first serves in.
  3. Break point games. In how many games did each player earn a break point?
  4. Stats for other important point scores. Break points are key, but other scenarios are important as well. If I have to pick only a few, let’s start with 0-30, 15-30, deuce, and ad-in (including 40-30). For all service points at each of those scores, I’d like (a) total points, (b) aces, (c) double faults, and (d) first serves in.
  5. Set points and match points: Same as above. Fans love match point stats.
  6. The game sequence–at what points did breaks of serve occur? This would allow us to answer many oft-posed questions: Do players hold serve more early in sets? Do breaks of serve more frequently follow breaks than holds? (And if so, how much more often?) Are players more like to drop serve immediately after winning a tight set?
  7. Set-by-set breakdowns of all stats that are currently kept, plus all of the above. The live scoring app separates stats by set, but there is no official archive with set-by-set breakdowns. This is particularly key for journalists attempting to tell the story of a match, when a small change in approach can turn the tide.
  8. Tiebreak breakdowns. Tiebreaks–especially long ones–have a life of their own, and analysts should be able to see all of the same stats for each tiebreak as for each set as a whole. For example, it would be interesting to see if a player’s ace or double fault rates (or even his or her first-serve percentage) changed between the first twelve games of a set and the breaker.
  9. A list of the score when each double fault occurred. (Aces would be nice, too.) Especially in men’s tennis, DFs are quite rare, and they often loom large in match narratives.
  10. Longest streaks for each player: consecutive aces, consecutive double faults, consecutive points won on serve, consecutive points won overall, and the score at the beginning and end of each of those streaks.
  11. For doubles matches, a separation of all of the above service stats by server. For the Samuel Groth/Leander Paes partnership, aggregate serve stats f (as they are presented now) aren’t going to tell you anything useful about either player’s performance at the line.

To reiterate, all of this stuff is in the scorecards. Most of the above are no more difficult to compile than the Match Stats that the tours already publish.

If the tours added everything on my list, that would be one big step out of the dark ages for tennis. Certainly, tennis writers would be able to file more intelligent stories and fans would have a much better way to experience the performances of their favorite players.

If the tours published current and archived raw point-by-point data, tennis would go one better: it would become an example for many other individual sports to follow. We would see an boom in fan engagement as every follower of the sport would have the opportunity to learn much more about tennis and relive matches–whether last week or late last century–in detail.

We’re not talking about a multi-million dollar infrastructure investment. To achieve all this, the tours need only do a little bit more with what they already have.

 

5 thoughts on “The Untapped Potential of Umpire Scorecard Data”

  1. Just last week I was re-reading Juan José’s post on the Changeover from April 2013 in which first, he did a “key point” analysis of the Djokovic-Nadal Monte Carlo final; and second, he enthused about Victoria Chiesa’s post that same month titled “Fed Cup Posts Official Scorecards, Tennis Nerds Everywhere Rejoice (aka just me).” I spent a little time reading both posts & also looking at software such as ProTracker that might make it easier for someone like me to do scoring. Regretfully I’ve concluded I don’t have the time, but even so it is an interesting branch of nerdery.

    It’s easier for me to imagine tours & events not making scorecards available than the opposite – e.g. not wanting to have to approach the other parties whose approval might be needed, or wanting to keep control just in case, etc. I wonder if what might break the ice, or start a trickle in the dam (insert cliche) would be if even just 1 or 2 events started doing it – perhaps because someone inside that event was fan-friendly enough, and strong-willed enough, to go ahead & say “yes, sure” if asked nicely. For example, Larry Ellison is perched at the top of a ladder with 8 million rungs on it, admittedly; but his organization is surely independent-minded enough and goodness knows he built a fortune on selling ways to crunch data. So if an event like that were to say “yes” than other events might grudgingly follow. Or not.

    Goodness, since it’s all digital now anyway, wouldn’t it be cool if an event didn’t just make the card available after the event, but live, via a data feed of some sort?

    1. Yeah, it does seem that someone like Ellison could be a great help here. OTOH, I’m not sure that events themselves have the right to release the data from scorecards–I think the tours own it. An event could collect and release identical data (much like TV networks log certain stats), but that would be fairly cumbersome.

      Where one broad-minded tourney could really break the ice would be with Hawkeye data — events own that, and a few of them have handed the data to researchers on a case-by-case basis.

  2. I want to clarify a couple of things from feedback I’ve received (not Wholesight’s comment above):

    1. Other scorekeeping methods (apps, the Match Charting Project, etc) are great, but they’re not what I’m talking about here. There are plenty of ways to collect this data, but it would be enormously difficult to get a crowdsourced project (or even a funded, commercial project) to collect this for every tour-level match, let alone every tour, qualifying, challenger, itf, and futures match. That’s what makes the scorecard database so valuable.

    1a. Also, any non-tour-affiliated project can only cover matches that happen in the future or are available on video. The ATP, on the other hand, has scorecards for all tour-level matches back to at least 1991. (I don’t know about the WTA.)

    2. I’m not saying that this is *all* the data that should be available. In a better world, hawkeye cameras would be on every court and there’d be ball-tracking data for every shot of every match. Or, much more cheaply, people (tour-affiliated or not) would use my Match Charting Project spreadsheet for 100 times more matches.

    What I am saying is that (a) this data already exists, (b) it exists for an enormous number of matches, and (c) it would be relatively easy for the organizations in question to release it in raw form. It would be great to have more, but it would be pointless to ask for it.

  3. I really like your point about “break point games.” I have definitely noticed matches where people say that “X has not been effective on break points”, but it turns out that almost every game that X had a break point led to a break. “5 out of 17” sounds bad, but if those 17 break points happened in 7 games, then it’s also “5 out of 7,” which is much better.

    1. The issue of “break-point games” was exemplified in the first set of last night’s Cincinnatti final between Federer and Ferrer. Both players had only one game with break points: Federer won his, Ferrer didn’t. But the break-point stats themselves look more extreme: 1 of 1 for RF; 0 for 4 for DF. (I wasn’t able to watch the rest of the match to see how this played out further, unfortunately.)

Comments are closed.

Discover more from Heavy Topspin

Subscribe now to keep reading and get access to the full archive.

Continue reading