Aryna Sabalenka played her first match at Indian Wells on Friday, handily beating Evgeniya Rodina. Sabalenka won the first set 6-1, then took a 3-0 lead in the second. Commentator Mikey Perera noted that Sabalenka’s win probability had reached 100%, though he (correctly!) expressed skepticism with the number.
Win probability has steadily crept in to tennis broadcasts. Often we’re shown pre-match percentages along with the change up to the current moment in the match. The silliness of a 100% mid-match win probability has a pedestrian explanation: The numbers are usually given as integers. For most fans, there’s no important difference between 55.7% and 58%, but in extreme cases, another significant digit would come in handy.
So, was the broadcast algorithm correct?
My Elo-based pre-match forecast set Sabalenka’s chances at 94.8%. To get mid-match predictions, we need more granular stats. Sabalenka has won 65.5% of serve points and 46.7% of return points this year (including the Rodina match), and if we nudge the RPW up to 47%, those components predict a 94.7% chance of a Sabalenka victory–virtually equivalent to the Elo forecast.
Plug those numbers into my win probability model with Rodina serving at 1-6, 0-3, and Sabalenka’s chances of victory are 99.7%. Round to the nearest integer, and sure enough, you get a 100% chance of victory. It might have felt that way for Rodina.
In fact, Sabalenka crossed the “100%” (99.5%) threshold in the previous game. She cleared 99.5% at 2-0, 15-0, slipped back under the line when she fell to deuce, then reclaimed it each of the two times she gained ad-in.
So far, I’ve used a relatively simple model to forecast the remainder of the match. (And it’s certainly sufficient for these purposes.) But if we were putting money on the outcome–especially if the first ten games of the match had gone in a less predictable direction–we’d want to do something more sophisticated. I’ve assumed that from 6-1, 3-0, Sabalenka would play the way we could have predicted before the match. In this case, that’s a sound assumption. But a better method would take into account the results of the match itself up to that point.
Through ten games, Sabalenka was playing better than the initial forecast of 66.5% on serve and 47% on return. Her success rate on serve was a bit worse, at 64.4%, but she was destroying any service advantage of Rodina’s, winning nearly 55% of those points. Had we known before the match that she would play that way, our pre-match forecast would have given Sabalenka a whopping 99.4% chance of victory.
Using that pre-match forecast, our prediction at 6-1, 3-0 would have been an overwhelming 99.97% for the favorite.
As the match progressed, then, we gained more and more information that the in-match performance–whether due to the conditions, the players’ fitness or mood on the day, the matchup, or any number of other factors–would be even more lopsided. Had we taken everything into account at 6-1, 3-0, we would have calculated some mix of 99.7% (based on pre-match numbers) and 99.97% (based on in-match performance). The degree to which we should weight each of those numbers is the tough part. Determining the correct weights is a complicated questions; suffice it to say that the correct answer is somewhere in between the two.
The broadcast algorithm jumped the gun with its 100% win probability, though only a bit. No matter how lopsided a match, anything can happen–but it probably won’t.