Analyzing Tennis Scoring

Check out the simulation tool for this project here!

With this simulation tool you can simulate a given ATP or WTA match with the following parameters:

Number of simulations (1, 10, 100, 500, 1000, 5000, 10000).
Best of 3 or 5 sets.
Surface (Hard, Clay, Grass - though some players have insufficient sample sizes).
Scoring (If the matches are played with ad scoring or no-ad scoring).
Player 1 and Player 2 - select the players to simulate matches with.

Abstract

Why have the proclaimed "Big 3" been so dominant?

For Novak Djokovic, Rafael Nadal, and Roger Federer, their main calling card is their prowess in Grand Slams. While they have all found a lot of success on the ATP tour, all with $>90$ ATP titles, their main claim to dominance is their Grand Slam wins. How could it not be? On the biggest stage, against the best players of the world, they have proven their dominance. This "dominance" is due to the fact that they are the greatest players of their generation. However, in addition to being great players, the best of five scoring system has greatly benefited all three of them.

In the 2023 U.S. Open (Currently his last Grand Slam), Novak Djokovic would have lost in the third round to Laslo Djere - $(4-6, 4-6)$.
In the 2011 French Open, Rafael Nadal would have lost in the first round to John Isner - $(6-4, 6-7, 6-7)$.
In the 2009 French Open, Roger Federer would have lost in the fourth round to Tommy Haas - $(6-7, 5-7)$.

However, they would all go on win these matches in five sets. There are many reasons for this:

It is much harder to sustain a level to win three out of five sets.
A great player is more likely to positively regress to the mean.
A worse player is more likely to negatively regress to the mean.

While there are many other variables to consider in any match (especially a best of five set match), such as timing, fitness, mentality, feeling, and ultimately skill, a large reason for the big 3's success on the grand slam level is that they played 5 sets instead of 3. The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution, $\mu x$ tends to get closer and closer to the true population mean, $\mu$. The Law of Large Numbers can be applied to these 3 players finding so much success in the best of five scoring system. Because the sample is larger, the true distribution in grand slams is much more clear than in other tournaments (ATP 250s, 500s, Masters 1000s).

How can we measure the effects of different scoring systems in professional tennis? How does deuce-ad and set scoring benefit certain players?

Methodology

First-Step Analysis

We can solve the probability of a server and a returner winning a deuce game using a Stochastic Process called First-Step Analysis.

A Markov Chain is a stochastic model that describes a sequence of possible events in which the probability of each event is based on only the previous state within the chain.

First-Step Analysis is a specific Markov process that can measure the likelihood of transitioning to each state within a Markov chain.

For this Markov chain we have the following state space:

$$\mathbb{S} = {Deuce,\ Advantage\ Server,\ Advantage\ Returner,\ Game\ Server,\ Game\ Returner}$$

To calculate the probability of a server or a returner winning a deuce game we can apply First-Step Analysis with the function:

$$f(x) = P[Server\ Wins\ Deuce\ Game\ |\ X(0) = x]\ for\ all\ x\ in\ \mathbb{S}$$

In order to calculate the probability of a single point the following assumptions will be made:

Each point is i.i.d (independent and identically distributed) and has no effect on all other points
The past does not matter (how the game got to deuce)
Player performance is independent of pressure
The possible outcomes for winning an individual point are:
- $\mathbb{S} = {Server,\ Returner}$
- $P[Server\ Wins\ a\ Point] = x\ in[0, 1]$
- $P[Returner\ Wins\ a\ Point] = 1 - P[Server\ Wins\ a\ Point]$
- $\sum_{}\mathbb{S} = 1$

In the tree diagram we can visualize this Markov Chain.

Where x represents the likelihood of the server winning a point and 1 - x represents the likelihood of the returner winning a point.

As previously stated, First-Step Analysis is a specific Markov process that can measure the likelihood of transitioning to each state within a Markov chain.

With this we can solve the probability of a player winning a game by applying First-Step Analysis and f(x).

Note the following absorbing states:

$$f(Game\ Server) = 1$$ $$f(Game\ Returner) = 0$$

We are interested in calculating the probability of the server winning.

We can calculate the likelihood of transferring between states of the Markov chain.

$$f(Deuce) = xf(Advantage\ Server)\ +\ (1-x)f(Advantage\ Returner)$$ $$f(Advantage\ Server) = (1-x)f(Deuce)\ +\ xf(Game\ Server)$$ $$f(Advantage\ Returner) = xf(Deuce)\ +\ (1-x)f(Game\ Returner)$$

We can create a transition matrix with:

$$\mathbb{S} = {Deuce,\ Advantage\ Server,\ Advantage\ Returner,\ Game\ Server,\ Game\ Returner}$$

in that order.

$$\begin{bmatrix} 0 & x & 1-x & 0 & 0\\ 1-x & 0 & 0 & x & 0\\ x & 0 & 0 & 0 & 1-x\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}$$

More importantly, we can calculate the probability of both players winning a game for any point win probability with the following system of equations:

$$\begin{array}{lcl} f(Deuce)-xf(Advantage\ Server) - (1-x)f(Advantage\ Returner)=0\\ f(Advantage\ Server) - (1-x)f(Deuce)= x \\ f(Advantage\ Returner) - xf(Deuce)=0 \end{array}$$

If you are interested in applying this system of equations, follow this link! (Assumes that $P[Server\ Wins\ a\ Point] = 0.75$ and $P[Returner\ Wins\ a\ Point] = 0.25$)

Assume that "x" represents "f(Deuce)"
Assume that "y" represents "f(Advantage Server)"
Assume that "z" represents "f(Advantage Returner)"
"a" represents a "start" state that will calculate: $$f(start) = P[Server\ Wins\ Deuce\ Game\ |\ X(0) = start]$$

Simulation Process

With the power of computing and player performance data from Jeff Sackmann, we can create a basic simulation for tennis matches accounting for 3 or 5 set matches, and incorporate a one point deuce,

The simulation I have created takes into account:

Both players on the court.
The surface.
The number of sets played.
The type of scoring system (deuce-ad versus one deuce point).
The probability a serve is made.
The probability the server wins the point.
The probability the returner wins the point.
After a serve is made, each point is "simulated" with a while loop that iterates until the server or returner probability returns True (wins the point).

Additionally, this simulation assumes:

Each point is i.i.d (independent, identically, distributed).
Player ability is independent of pressure.
Player distribution is approximately normal if they have $>= 1000$ service points.
All matches played are based on the players true serve/return ability.
Player opponents do not affect player ability
All surfaces are independent of each tournament they are played on (e.g. U.S. Open hard court versus Australian Open hard court is negligible).
Player ability is independent of different factors such as mental and physical state on each point.

References

All player performance data is sourced from Jeff Sackmann's GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
Markov Chain.png		Markov Chain.png
README.md		README.md
app.py		app.py
atp_matches_2023.csv		atp_matches_2023.csv
fsa_code.R		fsa_code.R
wta_matches_2023.csv		wta_matches_2023.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Tennis Scoring

Check out the simulation tool for this project here!

Abstract

Methodology

First-Step Analysis

In the tree diagram we can visualize this Markov Chain.

Note the following absorbing states:

Simulation Process

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analyzing Tennis Scoring

Check out the simulation tool for this project here!

Abstract

Methodology

First-Step Analysis

In the tree diagram we can visualize this Markov Chain.

Note the following absorbing states:

Simulation Process

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages