Check out the simulation tool for this project here!
With this simulation tool you can simulate a given ATP or WTA match with the following parameters:
- Number of simulations (1, 10, 100, 500, 1000, 5000, 10000).
- Best of 3 or 5 sets.
- Surface (Hard, Clay, Grass - though some players have insufficient sample sizes).
- Scoring (If the matches are played with ad scoring or no-ad scoring).
- Player 1 and Player 2 - select the players to simulate matches with.
Why have the proclaimed "Big 3" been so dominant?
For Novak Djokovic, Rafael Nadal, and Roger Federer, their main calling card is their prowess in Grand Slams. While they have all found a lot of success on the ATP tour, all with
- In the 2023 U.S. Open (Currently his last Grand Slam), Novak Djokovic would have lost in the third round to Laslo Djere -
$(4-6, 4-6)$ . - In the 2011 French Open, Rafael Nadal would have lost in the first round to John Isner -
$(6-4, 6-7, 6-7)$ . - In the 2009 French Open, Roger Federer would have lost in the fourth round to Tommy Haas -
$(6-7, 5-7)$ .
However, they would all go on win these matches in five sets. There are many reasons for this:
- It is much harder to sustain a level to win three out of five sets.
- A great player is more likely to positively regress to the mean.
- A worse player is more likely to negatively regress to the mean.
While there are many other variables to consider in any match (especially a best of five set match), such as timing, fitness, mentality, feeling, and ultimately skill, a large reason for the big 3's success on the grand slam level is that they played 5 sets instead of 3. The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution,
How can we measure the effects of different scoring systems in professional tennis? How does deuce-ad and set scoring benefit certain players?
We can solve the probability of a server and a returner winning a deuce game using a Stochastic Process called First-Step Analysis.
A Markov Chain is a stochastic model that describes a sequence of possible events in which the probability of each event is based on only the previous state within the chain.
First-Step Analysis is a specific Markov process that can measure the likelihood of transitioning to each state within a Markov chain.
For this Markov chain we have the following state space:
To calculate the probability of a server or a returner winning a deuce game we can apply First-Step Analysis with the function:
In order to calculate the probability of a single point the following assumptions will be made:
- Each point is i.i.d (independent and identically distributed) and has no effect on all other points
- The past does not matter (how the game got to deuce)
- Player performance is independent of pressure
- The possible outcomes for winning an individual point are:
$\mathbb{S} = {Server,\ Returner}$ $P[Server\ Wins\ a\ Point] = x\ in[0, 1]$ $P[Returner\ Wins\ a\ Point] = 1 - P[Server\ Wins\ a\ Point]$ $\sum_{}\mathbb{S} = 1$
In the tree diagram we can visualize this Markov Chain.
Where x represents the likelihood of the server winning a point and 1 - x represents the likelihood of the returner winning a point.
As previously stated, First-Step Analysis is a specific Markov process that can measure the likelihood of transitioning to each state within a Markov chain.
With this we can solve the probability of a player winning a game by applying First-Step Analysis and f(x).
We are interested in calculating the probability of the server winning.
We can calculate the likelihood of transferring between states of the Markov chain.
We can create a transition matrix with:
in that order.
More importantly, we can calculate the probability of both players winning a game for any point win probability with the following system of equations:
If you are interested in applying this system of equations, follow this link! (Assumes that
- Assume that "x" represents "f(Deuce)"
- Assume that "y" represents "f(Advantage Server)"
- Assume that "z" represents "f(Advantage Returner)"
- "a" represents a "start" state that will calculate:
$$f(start) = P[Server\ Wins\ Deuce\ Game\ |\ X(0) = start]$$
With the power of computing and player performance data from Jeff Sackmann, we can create a basic simulation for tennis matches accounting for 3 or 5 set matches, and incorporate a one point deuce,
The simulation I have created takes into account:
- Both players on the court.
- The surface.
- The number of sets played.
- The type of scoring system (deuce-ad versus one deuce point).
- The probability a serve is made.
- The probability the server wins the point.
- The probability the returner wins the point.
- After a serve is made, each point is "simulated" with a while loop that iterates until the server or returner probability returns True (wins the point).
Additionally, this simulation assumes:
- Each point is i.i.d (independent, identically, distributed).
- Player ability is independent of pressure.
- Player distribution is approximately normal if they have
$>= 1000$ service points. - All matches played are based on the players true serve/return ability.
- Player opponents do not affect player ability
- All surfaces are independent of each tournament they are played on (e.g. U.S. Open hard court versus Australian Open hard court is negligible).
- Player ability is independent of different factors such as mental and physical state on each point.
All player performance data is sourced from Jeff Sackmann's GitHub.