Raw Data From The Match Charting Project

In the last year and a half, dozens of contributors and I have amassed detailed shot-by-shot records of nearly 700 professional matches. You can see the full list here, or a menu sorted by player here.

I refer to this as The Match Charting Project, and I hope you’ll consider contributing as well. Using a straightforward text notation system, we record shot type, shot direction,  return depth, error types, and more. The more matches, the more interesting the results. The project made up part of my presentation at the Sloan Sports Analytics Conference last month, which included some very preliminary findings on player tendencies.

Now, you can dig into the raw data yourself. I’ve posted all of the user-submitted match charts in one place, in a standardized format for anyone who wants to mess around with it.

Enjoy!

 

3 thoughts on “Raw Data From The Match Charting Project”

  1. This is a great project. It’s just a shame that you bought into the whole W/FE/UE paradigm. Surely there’s no fundamental difference between a winner and a forced error as far as the intent of the players is concerned, or the eventual outcome. Worse, there’s clearly two distinct types of unforced errors – if you just make an error rallying that’s one thing, but going big for the lines and just missing is quite another.

    What’s more sometimes trying to read the story of what the players were trying to do in the match, just looking at something like the winner/unforced error ratio is entirely misleading. Lets say a player goes for a drop shot, but gets too much air and the opponent easily tracks it down and bunts the ball away. That goes down as a winner for the opponent when really it was an aggressive unforced error by the drop shotting player.

    What I’m saying is that it would be incredibly useful when analysing strategies and player matchups to know something of the intent of the two players on each point – not just how many times they hit their backhand cross court or whatever.

    1. This project is MUCH more than “just looking at something like the winner/UFE ratio.”

      It would be near impossible to standardize an attempt to consistently “know something of the intent of the two players.” But to take your example of a dropshot, that’s coded as a drop shot and then a winner … you can infer what it is you say you’re interested in from the data.

      1. Ok, for that example you can approximately pick it out from the data you have, but then you’d also have to look for signals for all sorts of other scenarios and it would get very messy, and would never be particularly accurate anyway. Consider a simpler example – a player hits a forehand down the line into the net. It’s important to the narrative of the match, and analysis of the strategies being used by the players, whether it would have been a winning play, or if it was just a passive rallying shot. There’s no way you can pick that out from your data.

        Sure, you couldn’t really do it for every shot in a rally, but you certainly can do it for the point ending sequence of shots. I’m not suggesting getting into the minds of the players, simply observing which of the three importantly different ways of winning a point actually occurred: whether a player won the point because the opponent made a passive error, an aggressive error, or if the player won the point with aggression.

        There is a huge amount of data you have here, and I can imagine lots of interesting things you can do with it, but it is all coloured by the W/FE/UE data.

Comments are closed.

Discover more from Heavy Topspin

Subscribe now to keep reading and get access to the full archive.

Continue reading