Consider the Monty Hall Dilemma example from the textbook (Dekking et al. 2005, sec. 1.3, p. 4). You are asked the following question:
Suppose you’re on a game show, and you’re given the choice of three doors; behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? – Craig F. Whitaker. ‘Ask Marilyn’. 1990. Columbia, Md.
The question was inspired by Monty Hall’s “Let’s Make a Deal” game show. Here are some assumptions we will make as participants in the game show.
However, we also know the host always opens a door with a goat. When you first pick a door with a goat behind, the host is forced to reveal the other goat. You win the car when you switch. You can only win by not switching if you made the right choice initially.
As illustrated, you have twice as much chance of winning the car if you switch and to quantify our chances of winning, we used probabilities.
The probabilities helped us express how likely it was that we chose the door with the car. It allowed us to make an informed decision to have a better chance of winning the car by quantifying our level of uncertainty about what’s behind each door. Another way of thinking about probability is in terms of long-term relative frequency. For example, if you play the Monty Hall game repeatedly over 1,000,000 times and switched the doors each time, you would expect to win approximately \(1,000,000 \times 2/3 \approx 666,667\) cars.
Probability is the science of uncertainty. It provides precise mathematical rules for understanding and analyzing our own ignorance (Evans and Rosenthal 2004).
The term probability refers to the study of randomness and uncertainty…the theory of probabilty provides methods for quantifying the chances, or likelihoods, associated with the various outcomes (Devore and Berk 2012).
So far, we discussed what a probability repersents. We will now formulate the mathematical framework to formally define probability.
We use the term (random) experiment in a very general sense to describe mechanisms and phenomena where the outcomes are unpredictable, or random.
A sample space is the collection of all possible outcomes from an experiment. It’s often denoted \(\Omega\) (Omega).
An event is a subset of the sample space.
In the Monty Hall example, we can think of the game as a random experiment with the location of the car as the outcome. The location of the car is unknown and unpredictable to the participant until the car is revealed at the end. The sample space is \(\Omega=\){Door 1, Door 2, Door 3}. Assuming the participant chooses Door 2 at the end of the game, the event that the participant wins is {_Door 2-} and the event of losing is {Door 1, Door 3}.
Events are represented as subsets of the sample space. We will define notations and rules that are useful when working with such sets. To demonstrate the notions, we will use Venn diagrams where two events, \(A\) and \(B\), are represented with circles enclosed by a rectangle representing their sample space \(\Omega\).
A B
The event that satisfies $A$ or \(B\) is called the union of \(A\) and \(B\). \(\cup\) is the set operator that represents a union.
\(A \cup B\)A B
The event that satisfies $A$ and \(B\) is called the intersection of \(A\) and \(B\). \(\cap\) is the set operator that represents an intersection.
\(A \cap B\)A
The event that does not satisfy an event is called the complement of the event. \({(\phantom{A})}^c\) is the set operator that represents a complement. Another way to represent this is \(\Omega\setminus A\) where \(\setminus\) is the set minus operator. The complement operation is a special cases of the set minus operation.
\(A^c\)Suppose you role a regular die once. Let \(A\) represent the event that you role an even number and \(B\) the event that you role a number less than 3.
\(\Omega\) is the collection of posssible numbers from a regular sie-faced die. That is,
$$\Omega=\left\{1,2,3,4,5,6\right\}.$$\(A\) and \(B\) is the event that you roll a number that is even or less than 3. That is,
$$A\cup B=\left\{1,2,4,6\right\}.$$\(A\) and \(B\) is the event that you roll a number that is even and less than 3. That is,
$$A \cap B=\left\{2\right\}.$$\(A\cup B\) is the event that you do not roll a number that is even or less than 3. That is,
$$\left(A\cup B\right)^c = \left\{3,5\right\}^c.$$Note that the last event can be rephrased as the event that you roll a number that is not even and not less than 3. In order words,
$$\left(A\cup B\right)^c = A^c \cap B^c.$$
The relationship is demonstrated with Venn diagrams below.
$$A\cup B$$
A B
Complement
$$(A\cup B)^c$$
A B
$$A^c$$
A B
$$B^c$$
A B
Intersection
$$A^c\cap B^c$$
A B
DeMorgan’s Laws generalizes the relationship.
DeMorgan’s Laws state that we have
$$\left(A\cup B\right)^c=A^c \cup B^c\text{ and } \left(A\cap B\right)^c=A^c\cup B^c$$
for any two events \(A\) and \(B\).
Disjoint events and subsets are two other relationships that are useful when discussion probability.
A B
For any two events \(A\) and \(B\), we say they are disjoint, or mutually exclusive if they have no outcomes in common. Their intersection is an empty set denoted as \(\emptyset\).
\(A \cap B = \emptyset\)A B
For any two events \(A\) and \(B\), we say \(A\) is a subset of \(B\), or \(B\) implies \(A\), if all outcomes of \(A\) lie within \(B\). Their intersection is \(A\).
\(A \subset B\)We are now ready to formally define probability. We provide two separate versions based on the size of the sample space here. Namely, we provide one for the simple case where the sample space is finite, or \(\lvert \Omega\rvert<\infty\), and one for the case where the sample space is infinite, or \(\lvert \Omega\rvert=\infty\).
For any event \(A\), \(\lvert A\rvert\) is called the cardinality of the event \(A\), or the size of the event. It is the count of all outcomes that belong to the event.
A probability function \(P\) defined on a finite sample space \(\Omega\) assigns each event \(A\) in \(\Omega\) a number \(P(A)\) such that
\(0\le P(A)\le 1\),\(P(\Omega)=1\), and\(P(A\cup B)=P(A) + P(B)\) if \(A\) and \(B\) are disjoint.The number \(P(A)\) is called the probability that \(A\) occurs.
A probability function \(P\) defined on a infinite sample space \(\Omega\) assigns each event \(A\) in \(\Omega\) a number \(P(A)\) such that
\(0\le P(A)\le 1\),\(P(\Omega)=1\), and\(P(A_1\cup A_2\cup A_3\cup \cdots) = P(A_1) + P(A_2) + P(A_3) + \cdots\) if \(A_1,A_2,A_3,\ldots\) are disjoint.The number \(P(A)\) is called the probability that \(A\) occurs.
Suppose we are interested in the probability of the union of two events \(A\) and \(B\) in sample space \(\Omega\). To compute the probability using the definition, we may start by identifying disjoint subsets of \(A\cup B\).
$$A\cap B^c$$
A B
$$A^c\cap B$$
A B
$$A\cap B$$
A B
Union
$$A \cup B$$
A B
\begin{equation} \implies P(A\cup B)=P(A\cap B^c) + P(A^c \cap B) + P(A\cap B) \tag{1} \end{equation}
It’s important to identify subsets that are disjoint.
Similarly, we can decompose events \(A\) and \(B\) into two disjoint subsets respectively.
\begin{equation} P(A) = P(A\cap B) + P(A\cap B^c) \end{equation}
\begin{equation} \implies P(A\cap B^c) = P(A) - P(A\cap B) \tag{2} \end{equation}
\begin{equation} P(B) = P(A\cap B) + P(A^c\cap B) \end{equation}
\begin{equation} \implies P(A^c \cap B) = P(B) - P(A\cap B) \tag{3} \end{equation}
Substituting Equations (2) and (3) into Equation (1), we can derive the following result.
\begin{equation} P(A\cup B)=P(A) - P(A\cap B) + P(B) - P(A\cap B) + P(A\cap B)\\ \end{equation}
\begin{equation} \implies P(A\cup B)=P(A) + P(B) - P(A\cap B) \tag{4} \end{equation}
This result for a union of any two sets hold true in general.
The Probability of a Uninion states that we have
$$P(A \cup B) = P(A) + P(B) - P(A\cap B)$$
for any two events \(A\) and \(B\).
A similar investigation between any event \(A\) and its sample space \(\Omega\) combined with the definition \(P(\Omega)=1\) yields the following general result for the probability of the complement of \(A\).
The Probability of a Complement states that wehave
$$P(A^C) = 1 - P(A)$$
for any event \(A\).
Let’s consider the example of rolling a fair die. Recall, the sample space \(\Omega\) is \(\{1,2,3,4,5,6\}\).
Note that each event with single outcome is disjoint from each other. Therefore,
$$P(\{1\}) + P(\{2\}) + P(\{3\}) + P(\{4\}) + P(\{5\}) + P(\{6\}) = P(\Omega) = 1.$$
Since it’s a fair die, all 6 outcome have the chance of being rolled. Let’s denote the probability with \(p\). Then, we have
$$6p = 1 \implies p = \frac{1}{6}.$$
In general, we can compute \(P(A)\) for any event \(A\) of the sample space \(\Omega\) using
\begin{equation} P(A)=\frac{\text{number of outcomes that belong to }A}{ \text{total number of outcomes in }\Omega } \end{equation}
if the following conditions are satisfied:
\(\Omega\) is a finite sample space.Recall \(B\) is the event that you roll a number less than 3 in the example of rolling a fair die. Using the counting method, we can compute the probability as:
$$P(B) = \frac{\lvert B\rvert}{\lvert\Omega\rvert} =\frac{2}{6}=\frac{1}{3}.$$
So far, we considered outcomes of running a single random experiment. Often, we are interested in outcomes of running multiple experiments.
Suppose you toss a coin with an equal chance of landing heads and landing tails twice. How should we define the sample space?
We will assume it is not possible for the coin to land on its side. While one may argue it is possible, the probability is so small that it is often negligible.
We can first examine the sample space of each of the two tosses individually. Let \(\Omega_1\) be the sample space for the first toss and \(\Omega_2\) be the sample space for the second. Since the coin can only land heads or tails from the first toss, we have
$$\Omega_1=\left\{H,T\right\}.$$
We are using the same coin for the second toss and the sample space does not change:
$$\Omega_2=\left\{H,T\right\}.$$
If we land heads followed by another heads, we can denote the outcome as \((H,H)\). If the first heads is followed by tails, we can denote \((H,T)\) and so on. Note that for each outcome from the first toss, we have the same set of outcomes for the second. We can thus see that
$$\Omega = \Omega_1\times\Omega_2 =\{H,T\}\times\{H,T\}=\left\{ \left(H,H\right),\left(H,T\right), \left(T,H\right),\left(T,T\right) \right\}.$$
In general, the sample space of multiple experiments is the Cartesian product of the sample paces for the individual experiments. In the case of two experiments, we have
$$\Omega=\Omega_1\times\Omega_2= \left\{\left(\omega_1,\omega_2\right): \omega_1\in\Omega_1,\omega_2\in\Omega_2\right\}.$$
Note that \(\lvert\Omega\rvert=\lvert\Omega_1\rvert\cdot\lvert\Omega_2\rvert\) when \(\Omega_1\) and \(\Omega_2\) are finite.
For the coin tossing example, we have
$$\lvert \Omega \rvert = \left| \{H,T\}\times\{H,T\}=\left\{ \left(H,H\right),\left(H,T\right), \left(T,H\right),\left(T,T\right) \right\} \right|=4=2\cdot 2.$$
We also note that each outcome in \(\Omega\) are equally likely and we can compute the probabilties of any events from the sample space by counting. For example,
$$P\left(\left\{H,T\right\}\right)=\frac{1}{4}.$$
In general, you can extend the counting method to multiple experiments when
Dekking, Frederik Michel, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Science & Business Media.
Devore, Jay L, and Kenneth N Berk. 2012. Modern Mathematical Statistics with Applications. Springer.
Evans, Michael J, and Jeffrey S Rosenthal. 2004. Probability and Statistics: The Science of Uncertainty. Macmillan.