Just over a year ago, I launched the Match Charting Project, a collaborative effort to track every shot of as many professional matches as possible. Many of you have contributed, and a few of you have given more time to the project than I could have ever hoped. Thank you.
To make the MCP possible, I devised a relatively simple notation system, tracking every type of shot and its direction, along with an Excel document to make recording each point easier. Earlier this year, I beefed up the stats generated for each match, showing not only hundreds of rates and totals for each player, but also player and tour averages for comparison.
The project has recently passed a number of milestones, and even more are coming soon. The database now includes at least one match for every player in the ATP and WTA top 100. There’s depth as well as breadth: 18 players (10 men and 8 women) are represented with at least 10 matches each.
The WTA portion of the database just passed 200 total matches, and by the end of the year, the combined total will cross the 500-match mark. Earlier this year, I hesitated to pursue too much research using this dataset because it was too small and biased toward a few players of interest, but those reservations can increasingly be put to bed.
Frequently on this site, I have reason to vent my frustration with the state of data collection in tennis, and an excellent recent article illustrates how, in many ways, the state of the art is no more advanced than it was thirty years ago. If the professional tours won’t even release all the data they have, let alone lead the way in improving the state of analytics in the game, it’s up to us–the fans–to do better.
The Match Charting Project is one way to do that. Every additional match added to the database increases our knowledge of a specific matchup, of a pair of players, of surface tendencies, and of the sport as a whole. We’ll probably never be able to chart every tour-level match, but as the first (almost) 500 matches have shown, the database doesn’t have to be complete to be extremely valuable.
If you’ve already contributed, thank you. If you’re interested in contributing, start here.
Good Stuff Jeff! About 10-11 years ago, I went a completely different route in match charting. More about risks, probability, court position and direction. It was presented at several meetings and is still online. But I think people have difficulty comprehending it. Far more interesting, since I could actually find whose backhand was weak behind the baseline, who hit forehand inside-out winners at the highest rate. Basically I termed it Payoffs. Howard Brody felt risk analysis was important so the concept of playoffs answered risk analysis. In fact, that’s what football, baseball, and basketball do. Fourth down conversions, lefty pitcher vs righty hitter are all risk analysis. In fact, those statistics are far more valuable to players and only recently IBM (at majors) and to a much less extent Playsight are able to do it. It would be nice to make even a more sophisticated method of analytics with current data IBM has.