This post is a very brief introduction to some basic concepts in probability theory. We encounter uncertainty often in our everyday lives, for example, in the weather, games of chance (think of rolling dice or shuffling a deck of cards), financial markets, etc. Probability theory provides a language to quantify uncertainty, thereby allowing us to reason about future events whose outcomes are not yet known. In this note, we only consider events where the number of potential outcomes is finite.

The basic object of study in probability is a **probability space**. A (finite) probability space consists of a (finite) **sample space** \(\Omega = \{\omega_1, \omega_2, \ldots, \omega_n\}\) together with a function \(P : \Omega \to [0,1]\) which assigns probabilities to the **outcomes** \(\omega_i \in \Omega\). The probability function \(P\) satisfies \(P(\omega_i) \geq 0\) for all \(i\) and

\[

\sum_{i = 1}^n P(\omega_i) = 1.

\]

The interpretation is that \(P\) assigns probabilities or likelihoods to the **events** in \(\Omega\).

**Example.** We model the randomness of tossing a coin. In this case, there are two possible outcomes of a coin toss, heads or tails, so we take our sample space to be \(\Omega = \{H, T\}\). Since heads and tails are equally likely, we have \(P(H) = P(T) = 1/2\).

**Example.** Rolling a standard (six-sided) die has six outcomes which are equally likely. Thus we take \(\Omega = \{1, 2, 3, 4, 5, 6\}\) and \(P(i) = 1/6\) for \(i = 1, 2, \ldots, 6\).

**Definition.** Let \(\Omega\) be a sample space and \(P\) a probability function on \(\Omega\). A **random variable** is a function \(X : \Omega \to \mathbf{R}\).

**Example.** Going back to the coin toss example above, we can define a random variable \(C\) on the coin toss probability space defined by \(C(H) = 1\) and \(C(T) = -1\). This random variable may arise as a simple game: two players toss a coin and bet on the outcome. If the outcome is heads, player 1 wins one dollar, while if the outcome is tails, player 1 loses one dollar.

**Example.** For the die rolling example, we can define the random variable \(R\) by \(R(i) = i\). The value of \(R\) is simply the value showing on die after it is rolled.

**Definition.** A fundamental value assigned to random variable is its **expected value**, **expectation** or **average**, which is defined by the formula

\[

E(X) = \sum_{i = 1}^n X(\omega_i) P(\omega_i).

\]

The expected value of a random variable quantifies the behavior we expect to see in a random variable if we repeat an experiment (e.g. a coin flip or die roll) many times over.

**Example.** Going back to the coin flip example, we can compute

\[

E(C) = C(H) P(H) + C(T) P(T) = 1 \cdot \frac{1}{2} + (-1) \cdot \frac{1}{2} = 0.

\]

This tells us that if we repeatedly play the coin flip betting game described above, neither player has an advantage. The players expect to win about as much as they lose.

For the die rolling example, we compute

\[

E(R) = R(1) P(1) + \cdots + R(6) P(1) = 1 \cdot \frac{1}{6} + \cdots + 6 \cdot \frac{1}{6} = 3.5.

\]

Thus an “average” die roll is \(3.5\) (even though \(3.5\) cannot be the outcome of a single die roll).

Often, when speaking about random variables we omit reference to the underlying probability space. In this case, we speak only of the probability that a random variable \(X\) takes on various values. For the coin flipping example above, we could have just defined \(C\) by

\[

P(C = 1) = P(C = -1) = 1/2

\]

without saying anything about the underlying sample space \(\Omega\). The danger in this view is that if we don’t explicitly define \(C\) as a function on some probability space, it may make comparison of random variables difficult. To see an example of this, consider the random variable \(S\) defined on the die roll sample space, \(\Omega = \{1,\ldots, 6\}\) by

\[

S(i) =

\begin{cases}

1 &\text{ if } i \text{ is even}\\

-1 &\text{ if } i \text{ is odd}.

\end{cases}

\]

Notice that, like our variable \(C\) defined for coin flips, we have \(P(S = 1) = P(S = -1) = 1/2\), so in some sense \(C\) and \(S\) are “the same.” However, they are defined on different sample spaces: \(C\) is defined on the sample space of coin flips, while \(S\) is defined on the sample space of die rolls.

Consider a game where the play is determined by a coin flip and a die roll. For the examples above, the random variable \(C\) depends only on the outcome of the coin flip, while \(R\) and \(S\) depend only on the outcome of the die roll. Since the outcome of the coin flip has no effect on the outcome of the die roll, the variables \(C\) and \(R\) are independent of one another, as are \(C\) and \(S\). However, \(R\) and \(S\) depend on the same outcome (the die roll) so their values may depend on each other. In fact, the value of \(S\) is completely determined by the value of \(R\)! So knowing the value of \(R\) allows us to determine the value of \(S\), and knowing the value of \(S\) tells us something about the value of \(R\) (namely whether \(R\) is even or odd).

**Definition.** Suppose \(X\) and \(Y\) are random variables defined on the same probability space (i.e., \(X, Y : \Omega \to \mathbf{R}\)). We say that \(X\) and \(Y\) are **independent** if for all possible values \(x\) of \(X\) and \(y\) of \(Y\) we have

\[

P(X = x \text{ and } Y = y) = P(X = x) P(Y = y).

\]

For our examples above with the coin flip and the die roll, \(C\) and \(R\) cannot be said to be independent because they are defined on different probability spaces. The variables \(R\) and \(S\) are both defined on the die roll sample space, so they can be compared. However, they are not independent. For example, we have \(P(R = 1) = 1/6\) and \(P(S = 1) = 1/2\). Since \(S(i) = 1\) only when \(i\) is even, we have

\[

P(R = 1 \text{ and } S = 1) = 0 \neq \frac{1}{6} \cdot \frac{1}{2}.

\]

Let \(W\) be the random variable on the die roll sample space defined by

\[

W(i) =

\begin{cases}

1 & \text{ if } i = 1, 4\\

2 & \text{ if } i = 2, 5\\

3 & \text{ if } i = 3, 6.

\end{cases}

\]

We claim that \(W\) and \(S\) are independent. This can be verified by brute force calculation. For example, note that we have \(S = 1\) and \(W = 1\) only when the outcome of the die roll is 4. Therefore,

\[

P(S = 1 \text{ and } W = 1) = \frac{1}{6} = P(S = 1) P(W = 1).

\]

Similar calculations show that similar equalities hold for all possible values of \(S\) and \(W\), hence these random variables are independent.

Given two random variables \(X\) and \(Y\) defined on the same probability space, we can define their sum \(X + Y\) and product \(X Y\).

**Proposition.** Suppose \(X, Y : \Omega \to \mathbf{R}\) are independent random variables. Then

\[

E(X + Y) = E(X) + E(Y) \quad\text{and}\quad E(X Y) = E(X) E(Y).

\]

**Proof.** For the first equality, by definition we compute

\[

E(X + Y) = \sum_{x, y} P(X = x \text{ and } Y = y) (x + y).

\]

Using the fact that \(X\) and \(Y\) are independent, we find

\[

\begin{align}

E(X + Y) &= \sum_{x, y} P(X = x \text{ and } Y = y) (x + y)\\

&= \sum_{x, y} P(X = x) P(Y = y) (x + y)\\

&= \sum_{x} P(X = x) x \sum_y P(Y = y) + \sum_y P(Y = y) y \sum_x P(X = x)\\

&= \sum_x P(X = x) x + \sum_y P(Y = y) y\\

&= E(X) + E(Y).

\end{align}

\]

The fourth equality holds because \(P\) satisfies \(\sum_x P(X = x) = 1\) and \(\sum_y P(Y = y) = 1\). Similarly, we compute

\[

\begin{align}

E(X Y) &= \sum_{x, y} P(X = x \text{ and } Y = y) x y\\

&= \sum_{x, y} P(X = x) P(Y = y) x y\\

&= \left(\sum_{x} P(X = x) x\right) \left(\sum_{y} P(Y = y) y\right)\\

&= E(X) E(Y).

\end{align}

\]

These equations give the desired results. ∎

The equation \(E(X + Y) = E(X) + E(Y)\) is satisfied even if \(X\) and \(Y\) are not independent. This fundamental fact about probability is known as the **linearity of expectation**. However, in order to have \(E(X Y) = E(X) E(Y)\), \(X\) and \(Y\) must be independent.

**Exercise.** Prove that \(E(X + Y) = E(X) + E(Y)\) without assuming that \(X\) and \(Y\) are independent.

**Exercise.** Give an example of random variables \(X\) and \(Y\) for which \(E(X Y) \neq E(X) E(Y)\). (Note that \(X\) and \(Y\) cannot be independent.