Tennis needs better stats. Now you can help.
Since the US Open, I’ve been developing a system to chart matches. With a bit of practice, anyone can use this system to note the type and direction of every shot in a match–serve direction, return direction and depth, shot patterns, error types, error directions, and more. A single charted match generates an enormous amount of data.
The true potential of match charting lies in the bigger picture. So far, we have nearly 50 matches in the books–mostly from ATP events this fall. Even with this relatively small subset of matches, I’ve been able to do some interesting research, such as analyzing how quickly Novak Djokovic can neutralize a server’s advantage, and evaluating the wisdom of the drop shot.
The more matches, the more players, the more surfaces, the better. Want to join the fun?
I hope you do, and the off-season is a great time to start. It will take you a couple of matches to get comfortable with the system, so charting recorded matches, with the ability to rewind and watch points multiple times, is the best way to get started. There are hundreds, if not thousands, on YouTube, with plenty more available through other sources such as ESPN3 and TennisTV.
I’ve created an interactive spreadsheet to make the process as easy as possible. Download it here. The fields highlighted in yellow are yours. The first several rows are for general information about the match. As you chart each point, the spreadsheet will automatically update the score and create an additional row for the next point.
Once you download and open the spreadsheet, click over to the “Instructions” tab. There, you’ll find detailed instructions on the process. It will take some time to understand all the details of how the system works, and then it will take you a match or two to get the hang of entering all that data. Pretty soon, you’ll find that you’re comfortably charting points in real time.
In the next week or two, I’ll try to put together some additional training material. However, if you’d like to get started right away, there’s nothing stopping you. Once you finish charting a match, send the completed spreadsheet back to me (my email address is in the spreadsheet), and I’ll run it through my program to generate detailed stats for that match.
In addition to the interactive spreadsheet itself, you may find it helpful to see a couple of completed charted matches, perhaps following along while watching the matches:
- Serena Williams vs Jelena Jankovic Charleston final:
completed spreadsheet | YouTube video - John Isner vs Rafael Nadal Cincinnati final:
completed spreadsheet | YouTube video
(lefties are almost as tricky to chart as they are to play–I recommend charting a few righty-righty matches before trying to do one with a left-hander)
(sorry, those two Youtube videos have been removed due to copyright claims. You can still download the completed spreadsheets. At some point, I’ll try to find charted matches with Youtube videos that are unlikely to be taken down, and post those here instead.)
What I love about this project is that we don’t need thousands of matches for it all to be worthwhile. (Though I won’t complain when we accumulate thousands of matches!) Every charted match we can add to the database contributes to our understanding of those two players and professional tennis as a whole.
I sincerely hope you’ll contribute.
Update: I’ve posted a few updates, tips, and tools here.
Jeff –
This is great – are you interested only in pro tennis charting now?
The reason I ask is that college coaches may be interested in this, and I have a definite in to explore that with some thought leaders I’m working with in the ITA. I’ll be going to their annual convention/trade show in Naples, FL, next month.
In terms of a future business for you, the college market may be much bigger than the pros. Though it will get less visibility initially if your pro charting gets traction and is published, talked about on air, the odds of that seem slim given IBM’s grip on data with the ATP and (?) the WTA.
The majors might being able to cut their own deals – I’m not sure.
I’ll forward this to Dave Fish at Harvard to see what he thinks. He’s stil sorry he didn’t get over to Longwood to meet you 2 summers ago.
Happy Holiday,
Rick
Hi Rick,
I’m definitely interested in people charting however many and whichever matches they’d like to, including at the college level. (Though the spreadsheet will take a bit of tweaking to accommodate set and match formats other than standard tiebreak sets.) It would be great if a college program were sufficiently interested to track all of their players’ matches.
For the time being, anyway, I’m not looking at this as a potential business–I’d like to gather as much data as possible, period.
hi Jeff, I just downloaded it and will attempt to see if I can do this. I am not a match person but this is a good effort. Karen
Jeff, this is a great idea. Is there a way of letting other contributors know in advance what match you’re going chart, just so two people don’t spend hours on the same project unnecessarily?
That’s a good idea, and I’ll implement it soon. For now, there have been so few matches charted that it seems unlikely, and it would actually be useful to have a few matches duplicated, so I can get a sense of how much certain metrics (like unforced errors) vary between two charters.
i’ll try.
Minor point here but when charting two right-handed players, I assume f3 would be the correct code for an inside out forehand as well as down the line?
Yes. The number refers to a constant part of the court — if you hit a forehand to that part of the court (3 = the backhand side of a RHer), it’s always the same number, regardless of where it is hit from.
Hi Jeff
Do you know how to skip effectively between an excel spreadsheet and youtube window? That would save a lot of time. Otherwise I need to pause a video after every single shot to move to the spreadsheet and code a shot.
I shrink both windows so that both can be on the screen at the same time. Usually that means shrinking the excel sheet down to just a couple of rows so that it fits on top of the other window. Here’s a screenshot:
http://tennisabstract.com/charting/chartview.png
well-thought-out
If player A plays a dropshot and his opponent responds by hitting a forehand winner that took place near the net, do I need to use “-” to underline that the forehand was hit close to the net? It’s kind of obvious but that’s what is suggested (?) in instructions. Quite confusing for me
No, you don’t need to add the “-” for shots following dropshots. That’s a good question — I should clarify the instructions accordingly.
Are unreturnable serves and return forced errors coded in the same way?
Or is it like: “6#” = unreturnable serve and “6f#” = return forced errors
Yes, that’s right.
Is a shank during a rally coded as a shank i.e. f!@? If so, should direction be ommited?
Yep, that’s right. Part of the reason there is a shank code at all is that sometimes direction isn’t clear, like if a player shanks a shot and it goes straight up into the air.
As this is the unofficial project home page, here’s an update from January 8th — a dozen people have charted a total of 134 matches, covering 115 total players. There are 21 Federer matches, 15 Delpo matches, and 14 each of Nadal, Djokovic, and Azarenka. And there are nine Simona Halep matches. (What can I say, I like Simona Halep.)
Hi are you still doing this? I’d like to try and get involved.
Yep!
Here’s a list of recent (and less recent) matches added to the database, including several from the French Open:
http://tennisabstract.com/charting/
I have a nascent project for visualizing tennis stats that now supports most MCP matches from the github repository (383 worked “out of the box” – others have irregularities in the .CSV structure). I originally built the app for .ptf files generated by ProTracker Tennis, so there are some features not yet enabled for MCP .csv files. I welcome feedback and collaboration.
Instruction for use: http://ca.mrallen.com:3000/
Example MCP matches:
http://ca.mrallen.com:3000/?file=2014RolandGarros_RN_AM.csv
http://ca.mrallen.com:3000/?file=2015_Monte_Carlo-GD_GM.csv
http://ca.mrallen.com:3000/?file=2014RolandGarros_ND-RN.csv
http://ca.mrallen.com:3000/?file=2015MonteCarlo_TB_RBA.csv
http://ca.mrallen.com:3000/?file=2014RolandGarros_DF_RN.csv
One of the features of most interest is probably Statistics compared across matches – only available if you load several match files at once and then press the XM button. The way to do this if you don’t have MCP .csv files handy would be to download each of the files from the above links by replacing ‘?file=‘ with ‘data/‘ like this:
http://ca.mrallen.com:3000/data/2014RolandGarros_RN_AM.csv
Most of the app is undocumented. Click on the players’ names in different contexts to get other graphs (in Momentum, for instance, you get a graph of Server Advantage vs. rally length (http://tennisabstract.com/blog/2011/08/17/how-long-does-the-servers-advantage-last/). Click on Total Points Won, Breakpoints Converted, and First Serve % for other views. Click everywhere.
For an example of Shots displayed on the Court, here is a ProTracker Tennis file from an Under-10 Tournament:
http://ca.mrallen.com:3000/?file=Walker%20Allen_Vito%20Ivanisevic_30-May-2015.ptf
I created detailed match statistics in the early 80’s, along with the program capability to run “what if” simulations to see how improvements would change the score. Tennis magazine published my results for Wimbledon and US Open for ’82-’84. I’ll send an electronic copy to the interested.