The Match Charting Project hits 1,000!

In less than two years since I first introduced the Match Charting Project and asked for the help of volunteer contributors, we’ve reached a major milestone: 1,000 matches!

I can hardly tell you how excited I am about this. When the concept behind the project was first suggested to me in 2012, I hesitated to act, in part because I didn’t think I could convince enough other people of the project’s merits to build a dataset of this size. I’ve been proven hugely wrong. Even at the beginning of 2015, I figured we’d be lucky to hit the four-figure barrier by the end of the calendar year. Instead, we’ve added matches at a faster pace than ever.

 

Thanks to MCP contributors, the tennis research community now has access to a standardized dataset of 144,000 points and 580,000 shots. Nothing like this has ever existed in a form that is available to anyone who wants to pursue their own research projects.

I want to take this opportunity to thank all of the 50+ MCP contributors. Special mention is owed to Lowell, who with 141 matches is our most prolific charter and who is a big reason why the WTA is even more extensively represented in the database than is the ATP. I’d also like to single out Edo, who started contributing less than three months ago and has already added 43 matches to the tally, including many Grand Slam finals.

The first 1,000 is, I hope, just a beginning. Please consider contributing to the project–download the spreadsheet and read more about how it works here.

To keep up with the project, you can always find the full list of charted matches here, or a list organized by player here. I plan to post a bit more about the Match Charting Project next week here at Heavy Topspin, as well.

Match Charting Project: More Matches, More Data, New Spreadsheet

The Match Charting Project keeps growing, and starting today, even more of the data is available for anyone who wants it. Several new contributors have helped us pass the 750-match milestone, having added an average of two matches per day since I first published the raw data.

New spreadsheet

The Match Charting spreadsheet now does a lot more. As you chart each point, the document updates stats for the match–both total and set-by-set. You’ll find the same stats you see on television (aces, double faults, winners, unforced errors, etc) along with some that are a little less common, like winning percentage in different lengths of rallies, and most consecutive points won.

In other words, As you chart the match, you’ll have access to many of the same stats that commentators do. Here’s what it looks like:

danka

If you’ve hesitated to try charting because you couldn’t see what was in it for you, I hope this changes the calculation a bit.

Click here to download the MatchChart template.

New data

About a month ago, I published the point-by-point data from all charted matches.  In raw form, it’s a bit daunting, and it’s more than what’s necessary for many interesting research projects.

Today, I added 15 different aggregate stats files for men, and another 15 for women. These contain the data that is shown in each charted match report. For instance, if you find it interesting that Simona Halep hit 14% of her backhands down the line in the Indian Wells final, you can take a look in the ShotDirection stats file and compare that number with the results from Halep’s other charted matches, or all matches in the database as a whole.

You can find these files (along with the updated raw data for 760+ matches) by clicking here.

Chart some matches

If you haven’t already, now is a great time to start charting professional matches and contributing to the project. An enormous number of matches are televised and streamed, and as the database of charted matches grows, there’s more and more useful context to all the data we’re generating.

You can start by jumping into the ‘Instructions’ tab of the new MatchChart spreadsheet, or for other tips, you can start with my blog post introducing the project.

Raw Data From The Match Charting Project

In the last year and a half, dozens of contributors and I have amassed detailed shot-by-shot records of nearly 700 professional matches. You can see the full list here, or a menu sorted by player here.

I refer to this as The Match Charting Project, and I hope you’ll consider contributing as well. Using a straightforward text notation system, we record shot type, shot direction,  return depth, error types, and more. The more matches, the more interesting the results. The project made up part of my presentation at the Sloan Sports Analytics Conference last month, which included some very preliminary findings on player tendencies.

Now, you can dig into the raw data yourself. I’ve posted all of the user-submitted match charts in one place, in a standardized format for anyone who wants to mess around with it.

Enjoy!

 

The Match Charting Project: One Year On

Just over a year ago, I launched the Match Charting Project, a collaborative effort to track every shot of as many professional matches as possible. Many of you have contributed, and a few of you have given more time to the project than I could have ever hoped. Thank you.

To make the MCP possible, I devised a relatively simple notation system, tracking every type of shot and its direction, along with an Excel document to make recording each point easier. Earlier this year, I beefed up the stats generated for each match, showing not only hundreds of rates and totals for each player, but also player and tour averages for comparison.

The project has recently passed a number of milestones, and even more are coming soon. The database now includes at least one match for every player in the ATP and WTA top 100. There’s depth as well as breadth: 18 players (10 men and 8 women) are represented with at least 10 matches each.

The WTA portion of the database just passed 200 total matches, and by the end of the year, the combined total will cross the 500-match mark. Earlier this year, I hesitated to pursue too much research using this dataset because it was too small and biased toward a few players of interest, but those reservations can increasingly be put to bed.

Frequently on this site, I have reason to vent my frustration with the state of data collection in tennis, and an excellent recent article illustrates how, in many ways, the state of the art is no more advanced than it was thirty years ago. If the professional tours won’t even release all the data they have, let alone lead the way in improving the state of analytics in the game, it’s up to us–the fans–to do better.

The Match Charting Project is one way to do that. Every additional match added to the database increases our knowledge of a specific matchup, of a pair of players, of surface tendencies, and of the sport as a whole. We’ll probably never be able to chart every tour-level match, but as the first (almost) 500 matches have shown, the database doesn’t have to be complete to be extremely valuable.

If you’ve already contributed, thank you. If you’re interested in contributing, start here.

The Almost Neutral Let Cord

Italian translation at settesei.it

Once I started charting matches–carefully watching and notating every shot–I thought I noticed a trend after “let” serves. It seemed that players missed far more first serves than usual after a let, and when players landed a post-let first serve, their offering was weaker than usual.

Now that we have nearly 500 pro matches in the Match Charting Project database, including at least 200 each from both the ATP and the WTA, there’s plenty of data with which to test the hypothesis.

To my surprise, there’s no such trend. If anything, players–men in particular–are more likely to make a first serve after a let cord. When they do, they are at least as likely to win the point as in non-let points, suggesting that the serve is no weaker than usual.

Let’s start with the ATP numbers. In over 1,100 points in the charting database, the server began with a let. He eventually landed a first serve 62.8% of the time, compared to 62.0% of the time on non-let points. When he made the first serve, he won 73.3% of points that began with a let serve, compared to only 70.6% of first-serve points when there was no let.

More first serves in, and more success on first serves. The latter finding, with its difference of 2.7 percentage points, is particularly striking.

Of the trends I had expected to see, only one is borne out by the data. Since a net cord let is only millimeters away from a fault into the net, it seems logical that net faults would be more common immediately after a let than otherwise. That is the case: 15.7% of men’s first serves result in faults into the net, but after a let,  that figure jumps to 17.0%.

When we turn to WTA matches with available data, we find that the post-let effect is even stronger. In non-let points, first serves go in at a 62.8% rate. After a first-serve let, players record a 65.3% first-serve percentage. Given that first-serve percentages are usually concentrated in a relatively small range, a difference of 2.5 percentage points is quite significant.

The WTA data tells a different story than the ATP numbers do when we look at the end result of those first serves. On non-let points, WTA players win first-serve points at a 62.8% rate, while after a first-serve let, they win these points at only a 61.8% clip. It may be that some women approach post-let first serves a bit more conservatively, and they pay the price by winning fewer of those points.

WTA players also appear to miss a few more post-let first serves into the net, though the difference is not as striking as it is for men. On non-let points, net faults make up 16.2% of the total, and after first-serve lets, net faults account for 16.7% of first serves. Of all the numbers presented here, this one is most likely to be no more than random noise.

It turns out that let serves don’t have much to tell us about the next serve or its outcome–and that’s not much of a surprise. What I didn’t expect was that, after a let serve, professionals are a bit more likely than usual to find success with their next offering.

If you like watching tennis and think this kind of research is worth reading, please consider lending a hand with the Match Charting Project. There’s no other group effort of its kind, and the more matches in the database, the more valuable the analysis.

Projected Matchups on TennisAbstract.com Tourney Pages

I’ve been tinkering around with the tournament pages on Tennis Abstract (for example, this week’s WTA event in Strasbourg), and I want to share the latest improvement with you.

If you are unfamiliar with TA’s tournament pages, it may take a moment to adjust to the method of presentation. But I’ve found that it’s a much more efficient way of presenting a lot more data than a traditional draw diagram–without the hassle of loading a PDF and zooming in and out.

In the left-hand column, you’ll find all upcoming matches, along with the career head-to-head record for each one. Click on the player links to go to their TA player page, or on the H2H record to see a list of H2H matches. Further down, you’ll find all results from the event (including qualifying rounds), most recent first. Take a close look at the “d.” in the middle of each completed match, and you’ll find that some of them are links. Click on those links to get the career H2H results for that pair of players.

In the right-hand column is a tournament forecast. The default view shows each player’s chances of reaching each round of the tournament. ATP forecasts are based on tournament simulations, which use jrank player ratings. WTA forecasts are based on official WTA rankings.

You’ll find today’s new addition here:

taforecast

 

You can click on the links in the top row, “Archived,” to see what the forecast looked like at earlier stages of the tournament.

New today, click on links in the “Probable matchups” row to see the most likely development of the tournament, including H2H records for likely later-round matches:

talater

I imagine that this will be particularly helpful at the beginning of the week for tournaments with larger draws, when you want to get a quick glance at, for instance, quarterfinal or semifinal pairings worth looking forward to.

You can always click “Current” in the top row to return to the real-time forecast.

More TennisAbstract news:

Draws and forecasts are available for French Open qualifying:

I’ll add main draw forecasts as soon as those draws are set, as well. You can find links to those on the front page of TennisAbstract.com. They’ll be updated hourly throughout the tournament.

If you’ve been wondering about some weird numbers on the ATP stats leaderboard, it’s because 2014 matches weren’t included. (Yes, I know it’s May. Ugh.) If you haven’t checked out that page, I hope you will. There are dozens of stats and hundreds of ways to filter results and generate rankings for the last two-and-a-half seasons. For instance, here are the leaders in 2014 return points won on clay.

Finally, we’ve hit a cool milestone with the Match Charting Project. Thanks to the hard work of Deb Decker, there are 50 Rafael Nadal matches in the database, including nearly every match from this year. You’ll also find at least one match for each of 90 players in the current ATP top 100 and 43 of the current WTA top 50. I hope you’ll consider contributing to this growing resource.

A Quarterfinal on Federer’s Racquet

The Roger FedererAndy Murray head-to-head is a bit of a baffling one. In twenty career meetings–18 of them on hard courts–Murray has won 11, including four of the last five.

Yet for a superficially tight one-on-one record, Fed and Murray haven’t played many tight matches against each other, especially lately. When they went five sets in last year’s Australian Open semifinal, it was the first time they had gone the distance in ten matches. The outcome of a match between them is up for grabs, but whoever wins it tends to do so by a handy margin.

Even that five-set semifinal last year wasn’t as close as it looked. Murray won 54.0% of total points and racked up a Dominance Ratio (DR) of 1.32, meaning that he won far more return points than Roger did. Five setters are usually much closer to 50% and 1.0, respectively. While Murray won far more points, Federer displayed his historically-great tiebreak skill to keep himself in the match.

DR is a convenient measure of the closeness of a match, where 1.0 is a dead heat. Only two Fed-Murray matches–both before 2009–fell in the range between 0.85 and 1.15. By contrast, Novak Djokovic and Rafael Nadal have played seven matches (including two Grand Slam finals) in that range, and Djokovic and Murray have played five.

Tactical nonsense

To traffic in conventional wisdom for a moment, Federer is the most aggressive of the Big Four, while Murray is the most passive. To the extent Andy is likely to hurt Roger, it has more to do with his ability to force Fed into trying to do too much, particularly on the backhand side. If Federer plays patiently and picks his spots, he can crush Murray. If he plays too passively or hits bunches of unforced errors, it can be a rough day at the office.

However, there may not be much Murray can do to determine which Roger shows up.  Simply forcing Fed to hit backhands certainly isn’t enough. The Match Charting Project has amassed shot-by-shot data, including the number of groundstrokes hit from either side, for 23 Federer matches so far. Nadal is particularly good at directing the ball to Federer’s backhand, forcing Roger to hit 56% to 58% of groundstrokes from the backhand side in both a win (last year’s World Tour Finals) and a bad loss (the 2011 Tour Finals).

Taking the average of these 23 matches (most of which are Federer wins, as the Match Charting Project seems to have drawn lots of Fed fans), Roger hits 52.5% of his groundstrokes from the forehand side. This reflects the balance of two factors: Federer wanting to hit his forehand, and opponents trying to keep the ball away from it.

Surprisingly, hitting lots of balls to Fed’s backhand side seems to have few benefits. There is no meaningful correlation between DR and the percentage of groundstrokes Fed hit on the backhand side.

Based on the limited data available, it appears that Murray has tried a variety of tactics.

In the two Fed-Murray matches for which we have shot-by-shot data–the 2010 Australian Open final and the 2012 Dubai final–Murray took opposite approaches to the problem. In the Melbourne final, he managed to direct 57% of balls to Fed’s backhand, which is as good as anyone but Nadal has managed. In the Dubai match, Roger hit 64% of his groundstrokes from the forehand side, the second-highest rate of any of the 23 Federer matches in the database.

In both cases, Murray lost. To take another example, Juan Martin del Potro has beaten Fed while letting him hit 57% forehands and lost to him while forcing him to hit 57% backhands.

The database–limited in matches and biased as it is toward Fed’s victories–probably can’t take us any farther. But from here, we can speculate that Federer has it in his power to win or lose regardless of the tactics thrown his way. Murray, like Nadal, has always forced him to hit one extra ball. The sort of aggression that takes a player far out of position to hit, for instance, an inside-out forehand can backfire against such a talented defensive player.

In four matches at the Australian Open so far, Federer has offered us plenty of glimpses of his glory days. Murray will likely prove to be his biggest test of the tournament, but Fed’s fate still hangs on his own racquet.

Should WTA Players Approach the Net More?

Italian translation at settesei.it

21st-century women’s tennis is a baseline game. Some players are better able to identify opportunities to approach the net than others, and some can handle themselves quite well when they get there. But if a fan from a few decades ago were dropped off at the 2014 Australian Open, she would be shocked by the rarity of net points and the clumsiness of many players when they move forward.

Since almost all television commentators were excellent players in a more net-centric era, a frequent refrain during almost any broadcast is that players should rush the net more often. “Frequent” might be understating it–in a fit of pique, I was driven to say this:

Regardless of repetition, it’s worth further investigation. It’s certainly true that a skilled netwoman could win more points by moving forward. But when pros don’t emphasize that part of their game and they gain little match experience approaching the net, do they have the skills necessary to take advantage of such an opportunity?

Enter some numbers

At this point, you might be tempted to look at the oft-collected “Net Points” stat. Resist the urge. In a baseline-oriented match, net points can have little to do with net approachesAttempting to return a drop shot is considered a net point. Putting away a weak service return is considered a net point. In many WTA matches, more than half of “net points” do not involve an approach. The player was induced to come to the net for some reason.

Making matters worse, that non-approach segment of net points has little to do with net approaches. Given a weak, floating return, any competent player should be able to whack it for a swinging volley winner. At the other end of the spectrum, chasing down a drop shot relies on a different set of skills than picking a moment to hit an approach shot and then confidently placing a volley or two.

Fortunately, the Match Charting Project gives us some more detailed, approach-specific data.

Twenty matches in the charting database are from the first month of the 2014 WTA season, most of them from the first week in Melbourne. This data differentiates between “net approaches” and “net points.” In one of the more aggressive performances in the database, Angelique Kerber, in her loss to Tsvetana Pironkova in Sydney, won 15 of 19 net points. Of her ten net approaches, she won all ten.

(For any match report in the charting database–here’s the Kerber-Pironkova match–click one of the two “Net Points” links to see those stats. There is a different table for each player.)

Kerber’s ten net approaches is tied for the most of any of the WTA matches that have been charted this year. Last night, Garbine Muguruza also tallied ten net approaches, though she did so in a longer match.

In these twenty matches, only 27 of 40 players made even one traditional net approach. Including those who made zero, the average is just over three net approaches per match. The 27 who approached the net at least once averaged 4.7 per match.

Clearly, a lot of opportunities for offense are going unclaimed.

How they’re doing

Of the 126 net approaches we’ve tracked, the approaching player has won 84–exactly two-thirds. While that isn’t an overwhelming endorsement–many approach shots are hit in response to a weak groundstroke that already puts the opponent at a disadvantage–it certainly doesn’t count as evidence against the practice.

In half of all net approaches, the netrusher either hits an outright winner at the net or induces a forced error with a net shot.  Only 12% of the time does the opponent hit a passing shot winner. In another 5% of these points, the opponent induces a forced error with a passing shot. In 12% of net approach points, the player who moved forward hits an unforced error at the net.

Of the 27 players in the database who approached the net at least once, only six failed to win half of those points (three of whom only came forward once), and three more won exactly half of their net approach points.

The women in this sample who seize the most opportunities to rush the net have been particularly successful, as well. Seven of the eight who moved forward the most won more than half of their approach points.  This allows us to tentatively conclude that all the other players–the ones who picked only a few spots to approach the net during their matches–could have seized more opportunities. There may be a limit in the modern game to how much netrushing is wise, but the observed maximum of ten points per match doesn’t seem to be it.

Inevitable unknowns

Whether we look at Kerber and her 10/10 net-approach performance in Sydney or Sloane Stephens and her 1/1 tally yesterday against Elina Svitolina, it’s impossible to know the results of the next approach shot–or the next five.  We can compare single-match results and see that it’s possible for a WTA player to have a perfect record on her ten net approaches, but we can’t perform lab experiments in which Sloane plays Svitolina again and comes forward ten times instead of one.

For all the success that players enjoy when they do move forward, there are plenty of reasons not to. As I said at the outset, today’s players don’t practice net skills nearly as much as baseline skills, and they certainly don’t get much in-match practice. If someone isn’t comfortable approaching the net at a certain time, is it really a good idea for her to do so?

In the abstract, both intuition and statistical analysis supports the position that WTA players could move forward more. When they do approach the net, they are often successful, putting away volley winners and rarely getting passed. But I suspect this implies a long-term strategy more than the sort of thing a coach should emphasize during a changeover.

When commentators suggest that a player should move forward, what I think they really mean is this: “If this player were more comfortable with her transition game, this would be a great opportunity to take advantage of that.” Or: “Players should work harder on their approach shots on the practice court so that they’re ready for opportunities like this one.” Or simply: “Martina would have won that point ten shots ago.”

There seems to be opportunity waiting for more, well, opportunistic young players. But it isn’t one that can be generated simply by a sudden coaching change or a harangue from John McEnroe. Only when a player emerges with the baseline game to contend with the best pros and a transition/net game that exceeds most of those on the tour today will we find out just how much opportunity today’s players have wasted.

Match Charting Project: Update, Tutorial, Tracking, Tools

Since I announced the Match Charting Project last week, the response has been tremendous.  More than one thousand of you read the post, more than one hundred people downloaded the match charting spreadsheet, and several people have already charted matches, helping build what is already a very useful resource.

We’re nearing 100 charted matches.  Here’s the full list.  A couple of notable recent additions are this year’s Wimbledon men’s final (thanks Verity!), and the 2009 French Open match in which Soderling upset Nadal (thanks Amy!).

New spreadsheet version

I’ve added functionality to note serve-and-volley points, using the plus sign (“+”) after the serve notation.  (I’ve added a bit more detail in the instructions sheet to help explain it.)  It’s optional, but it would be very useful information to have, and if you want to track serve-and-volley attempts this way, you’ll need the newest version of the spreadsheet.  Download it by clicking on the link.

Match charting tutorial

To give you an idea of what match charting is all about, I recorded my screen while charting the first few games of a match.  While it’s not the most captivating entertainment, it demonstrates how I set up my screen, and it may help you make sense out of the notation system we’re using.

Tracking

I maintain two versions of the list of charted matches–by date, or by player. If you’d like to chart a match that isn’t on those lists and is more than a couple of weeks old, you can be almost certain that no one else is working on it. But if you’d like to do a current match, or you just want to make sure, email me to check before you begin. Once you’ve completed your first match, I’ll invite you to a Google doc where charters “claim” matches to avoid duplication.

Charting tools

Here are some tips and tricks that might help you chart a little more effectively.

I find it more convenient to watch video files that are stored on my hard drive–that way, I can work without an internet connection, or survive a weak wireless connection.  You can download YouTube videos using KeepVid, and you can download videos from many other sites with Jaksta.

Once you’ve downloaded a video file, I highly recommend using mplayer to view them.  The killer feature here is that it allows you to speed up or slow down playback.  When you’re starting out, you might want to go as slow as 50% or 60%.  As you get better, you can speed up.  Another great mplayer feature for charting purposes is the ability to skip forward or backward ten seconds or one minute.  It’s a very effective way to rewind and watch a point again, if you missed it.  You can also quickly skip through changeovers, or even through long delays between points, if you’re charting that sort of player.

Finally, if you’re watching videos in fullscreen, you might want to try the 4t Tray Minimizer.  It allows you to pin any program on top, so for instance, if you want to watch TennisTV in fullscreen but keep the spreadsheet on top, it makes that possible.

If you have any questions or suggestions, please email me or leave them in the comments.  Thanks for all your interest so far!

The Match Charting Project

Tennis needs better stats.  Now you can help.

Since the US Open, I’ve been developing a system to chart matches.  With a bit of practice, anyone can use this system to note the type and direction of every shot in a match–serve direction, return direction and depth, shot patterns, error types, error directions, and more.  A single charted match generates an enormous amount of data.

The true potential of match charting lies in the bigger picture.  So far, we have nearly 50 matches in the books–mostly from ATP events this fall.  Even with this relatively small subset of matches, I’ve been able to do some interesting research, such as analyzing how quickly Novak Djokovic can neutralize a server’s advantage, and evaluating the wisdom of the drop shot.

The more matches, the more players, the more surfaces, the better.  Want to join the fun?

I hope you do, and the off-season is a great time to start.  It will take you a couple of matches to get comfortable with the system, so charting recorded matches, with the ability to rewind and watch points multiple times, is the best way to get started.  There are hundreds, if not thousands, on YouTube, with plenty more available through other sources such as ESPN3 and TennisTV.

I’ve created an interactive spreadsheet to make the process as easy as possible. Download it here.  The fields highlighted in yellow are yours.  The first several rows are for general information about the match.  As you chart each point, the spreadsheet will automatically update the score and create an additional row for the next point.

Once you download and open the spreadsheet, click over to the “Instructions” tab.  There, you’ll find detailed instructions on the process.  It will take some time to understand all the details of how the system works, and then it will take you a match or two to get the hang of entering all that data.  Pretty soon, you’ll find that you’re comfortably charting points in real time.

In the next week or two, I’ll try to put together some additional training material.  However, if you’d like to get started right away, there’s nothing stopping you.  Once you finish charting a match, send the completed spreadsheet back to me (my email address is in the spreadsheet), and I’ll run it through my program to generate detailed stats for that match.

In addition to the interactive spreadsheet itself, you may find it helpful to see a couple of completed charted matches, perhaps following along while watching the matches:

(sorry, those two Youtube videos have been removed due to copyright claims. You can still download the completed spreadsheets. At some point, I’ll try to find charted matches with Youtube videos that are unlikely to be taken down, and post those here instead.)

What I love about this project is that we don’t need thousands of matches for it all to be worthwhile.  (Though I won’t complain when we accumulate thousands of matches!)  Every charted match we can add to the database contributes to our understanding of those two players and professional tennis as a whole.

I sincerely hope you’ll contribute.

Update: I’ve posted a few updates, tips, and tools here.