Random Variables | Baeldung on Computer Science

1. Introduction

In this tutorial, we’ll explain random variables.

2. Background

Let’s say that $\Omega$ is the set of all possible outcomes of a random process we’re analyzing. We call $Omega$ the sample space. For instance, when tossing a coin, there are two outcomes: head ( $H$ ) and tail ( $T$ ), so $Omega = \{H, T\}$ . Similarly, when flipping a coin 4 times in a row, there are $2^4$ outcomes in the sample space:

An event is any subset of $\Omega$ . For example:

If we define a probability $P$ telling us how likely each event is, we get a probability space. More precisely, $P$ maps the events defined over $\Omega$ to $[0, 1]$ . That’s where random variables come into play.

2.1. Random Variables

Usually, we’re interested in the numerical values the events represent or can be assigned. For example, if we toss a coin 100 times, we may be interested only in the number of heads and not the exact sequence of the $H$ s and $T$ s.

Intuitively speaking, random variables are numerical interpretations of events. Those numerical values aren’t arbitrary. They represent precisely those quantities we’re interested in.

So, a random variable $\boldsymbol{X}$ maps events to numbers $\mathcal{B}$ . Using the events’ probability $P$ , we derive the probability $P_X$ with which $X$ takes values from $\mathcal{B}$ .

For example, if our coin is fair, each outcome in $\Omega$ is equally likely. If we define $X$ as the number of heads in four flips, we get this $P_X$ :

$\begin{pmatrix} 0 & 1 & 2 & 3 & 4 \\ \frac{1}{16} & \frac{1}{4} & \frac{3}{8} & \frac{1}{4} & \frac{1}{16} \end{pmatrix}$

There are two main types of random variables: discrete and continuous.

3. Discrete Variables

We say that $\boldsymbol{X}$ is discrete if the set of values it can take with a non-zero probability is countable.

For instance, if $\mathcal{B}$ is finite, $X$ is a discrete random variable. But, a variable can take infinitely many values and still be discrete.

3.1. Countability

Let’s say we’re flipping a coin until we get two heads in a row. We can get $HH$ in the first two flips, but there may be a sequence of 100 $T$ s before we get the first $H$ . In fact, we may never get two $H$ s one after another since there’s always a non-zero chance to get a $T$ after an $H$ .

So, if our random variable $X$ represents the number of tosses until getting two $H$ s in a row, the set of its values $\mathcal{B}$ will be infinite:

$2, 3, 4, \ldots$

However, it’s still countable! That means we can arrange it as an array. Since each value is possible with a non-zero probability, we say that the $X$ is discrete.

3.2. The Probability Mass Function

Mathematically speaking, the probability $P_X$ is defined over subsets of $\mathcal{B}$ just as the probability $P$ is defined over subsets of $\Omega$ . The function mapping individual values $\boldsymbol{x \in \mathcal{B}}$ of $\boldsymbol{X}$ to their probabilities is known as the probability mass function (PMF) $\boldsymbol{p_X}$ .

The distinction is technical for the most part since we can define one using the other. Here’s how we get PMF from $P_X$ :

$p_X(x) = P_X(\{ x\}) \quad (\forall x \in \mathcal{B})$

and vice versa:

$P_X(E) = \sum_{x \in E}p_X(x) \quad (\forall E \subseteq \mathcal{B})\\$

3.3. The Cumulative Distribution Function

Let $x$ be any value $X$ can take. The cumulative distribution function (CDF) of $X$ is defined as:

$\mathrm{CDF}_X(x) = P_X(X \leq x)$

For discrete variables, we calculate the CDF by summing individual probabilities:

$\mathrm{CDF}_X(x) = \sum_{z \in \mathcal{B} \mid z \leq x} p_X(z)$

In our example with 4 tosses and with $X$ denoting the number of heads, $\mathrm{CDF}_X(x)$ shows us the probability to get $x$ or fewer heads:

As we see, plotting $F_X$ against sorted $\mathcal{B}$ reveals a non-decreasing staircase function. The probability that $X$ gets a value between $a$ and $b$ is the corresponding area under the CDF.

We calculate it as follows:

$P_X(a\leq X \leq b) = F_X(b) - F_X(a)$

3.4. Examples

We differentiate between various types of variables depending on the shapes of their CDFs.

For instance, a uniform discrete variable $X$ assigns equal probabilities to each value in $\mathcal{B}$ :

If there are only two values $X$ can take, which we usually denote as 0 and 1, we have a Bernoulli random variable:

4. Continuous Variables

Let $X$ model the time (in minutes) we spend waiting for an order in a restaurant. Let’s also say that the restaurant guarantees the waiting time is 15 minutes at most. So, $X=[0, 15]$ . In what ways is this $X$ different from the count of $H$ s discussed above?

First, there are uncountably many different values it can take: 10, 11, 10.5, 10.55, 10.555 minutes, and so on. But that’s not the most important difference.

The probability $\boldsymbol{P_X}$ is spread over the uncountably many values in the range $[0, 15]$ . Since all those values are possible, we need to allocate some probability to each one. However, because there are infinitely many of them, and the total probability is finite (=1), the allocated amounts get so small that they’re practically zero ( $=\frac{1}{\infty}$ ). So, if we single out an individual value $x \in [0, 15]$ , the probability of its realization $P_X(x)$ is zero.

That’s the definition of continuous variables. A random variable $\boldsymbol{X}$ is continuous if $\boldsymbol{P_X(x)=0}$ for every value $\boldsymbol{x}$ it can take.

4.1. Continuous CDF

The CDF of a continuous random variable is continuous everywhere:

The jumps and the staircase shape of a discrete variable’s CDF happen at the points at which $P_X(x) > 0$ . Since $P_X(x)=0$ for each $x \in \mathcal{B}$ if $X$ is continuous, there can be no jumps in the CDF plot. By definition, that means the corresponding CDF is continuous.

4.2. Probability Density Function

If $\mathrm{CDF}_X$ has a derivative $f_X$ , it holds that:

$\mathrm{CDF}_X(x) = \int_{-\infty}^{x}f_X(u)du$

We call such a function $f_X$ the probability density function (PDF) of $X$ . It’s zero outside the variable’s support, and the integral over it must be equal to 1:

$\int_{-\infty}{\infty}f_X(u)du = 1$

Otherwise, it isn’t a proper density since we’ll get a total probability greater or lower than 100%, which doesn’t make sense.

The PDF of a continuous variable is analogous to the PMF of a discrete one. Both functions take the variable’s individual values as arguments. However, while PMF reveals their probabilities, the PDF shows us only how likely the values are one versus the other.

4.3. Examples

A continuous uniform variable’s PDF is constant over the range it’s defined over:

$f_X(u) = \frac{1}{b-a} \quad \text{ if } \mathcal{B} = [a, b]$

So the corresponding CDF is:

$\mathrm{CDF}_X(x) = \frac{x - a}{b - a}$

Another example is the class of exponential variables. Their densities drop exponentially, so small values are more likely than larger ones. The rate at which the density decreases is controlled by a parameter we usually denote as $\lambda$ :

5. Discrete vs. Continuous Variables

Here’s a summary of the differences between discrete and continuous random variables:

Discrete	Continuous
Countably many values with positive probabilities	No values with positive probabilities
Non-continuous step CDF	Continuous CDF
Usually denote counts	Usually represent measurements

6. Determinism and Randomness

In a non-probabilistic context, a math variable holds an unknown but fixed value. So, chance plays no part in calculating it.

In programming, we can update a variable:

$x \leftarrow x + 1$

But at every point during our code’s execution, $x$ always holds one and only one value. So, each time we use a deterministic $x$ , it evaluates to a single value. Unless we update it, that value stays the same.

In contrast, a random variable models a random process or an event. It doesn’t hold values but samples them according to the underlying probability. Each time we “use” a random variable, it can generate a different value due to randomness in the process or phenomenon.

6.1. The Nature of Randomness

There are two main interpretations of randomness.

In the frequentist school of thought, randomness is a property of physical reality. From this viewpoint, some natural (and even human-driven) processes are governed by inherently random laws. Those laws determine the long-term frequencies of the processes’ possible outcomes through probability functions. In other words, a random law doesn’t define the outcomes but the chances they’ll materialize. The laws’ true analytical forms are unknown, and the goal of statistics and science is to uncover or approximate them.

In the subjectivist (or Bayesian) tradition, probabilities quantify and represent our uncertainty about the world. They aren’t laws of nature or human society but mathematical tools we use to formalize our belief states. Hence, probabilities don’t exist independently from us and aren’t unique. Each conscious being can develop its own beliefs about a process or an event and express them using a functional form different from those others choose. Therefore, randomness originates from our inability to understand the world completely and aligns with the limitations of our knowledge.

7. Mixed and Multivariate Variables

Apart from continuous and discrete variables, there are also mixed ones. A mixed variable’s CDF consists of the step-like and continuous parts:

$\mathrm{CDF}_X(x) = \int_{-\infty}^{x}f_X(u)du + \sum_{z_i \leq x}p_X(z_i)$

where the $z_i$ are those values at which $P_X$ is positive.

All the variables we discussed were univariate (one-dimensional). However, a random variable can have more than one dimension. In that case, we call it multivariate. We consider each dimension a univariate variable, so a multivariate variable denotes an array of one-dimensional ones.

8. Conclusion

In this tutorial, we explained random variables. We use them to quantify our belief states or the outcomes of random processes and events.

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung