Probability: Joint vs. Marginal vs. Conditional

1. Overview

The probability of an event is a value between 0 and 1 inclusive. It indicates how likely the occurrence of this event is. A value of 0 means this event is not likely to occur. On the other hand, a value of 1 means this event is sure to happen. Any value in between indicates likelihood. The larger the value, the more the possibility. Now, let’s consider two events, $A$ and $B$ . Joint, marginal, and conditional probabilities are values we obtain by considering both events $A$ and $B$ .

In this tutorial, we’ll discuss the differences between joint, marginal, and conditional probability.

2. Probability of an Event

Let’s assume that we’ll perform a well-known chemical experiment. If we’re accurate and get all the ingredients right, then we already know the outcome. Such experiments are called deterministic. On the other hand, some experiments are not as predictable. We call them random.

2.1. Random Experiments and Their Outcomes

In a random experiment, we know the set (collection) of all possible outcomes. This set is called the sample space. However, we do not know (for sure) which outcome we’re going to obtain when we perform the experiment. As an example, let’s consider a weather station. It measures the temperature each day at 2 p.m. This is a random experiment. This is because we know the temperature range depending on the time of year. However, before actually reading the temperature, we cannot determine the value.

2.2. Events as Sets of Outcomes

An event is a set of outcomes of a random experiment. It is thus a subset (part) of the sample space. Using the experiment of the weather station, let’s define three events:

Event COLD: temperature $<$ 15 degrees Celsius
Event MILD: 15 $\leq$ temperature $\leq$ 28
Event HOT: temperature $>$ 28

Of course, each one of these events represents a set of temperature readings (outcomes of our experiment). Moreover, we may use Venn diagrams to visualize events. In these diagrams, an outcome is a point, an event is a circle, and the sample space $S$ is a rectangle. So, in our experiment, a point represents a temperature reading. Hence, the circle of the COLD event, for instance, contains all readings below 15 degrees Celsius. Finally, the sample space’s rectangle contains all recorded readings.

Let’s take a closer look at the figure above. We notice that we have six temperature readings. Two of these readings are occurrences of the event COLD, i.e., they’re below 15 degrees Celsius. However, the four other readings do not belong to the event COLD. In fact, they belong to the event NOT COLD. Actually, any event $A$ in a sample space $S$ has a complementary event $\overline{A}$ (NOT $A$ ). The event $\overline{A}$ contains all outcomes that do not belong to $A$ .

2.3. Probabilities of Events

Now, let’s suppose we performed some random experiment $N$ times. Out of the $N$ outcomes, the event $A$ occurred $N_A$ times. Or, let’s be more accurate and say that what occurred was an outcome that belongs to $A$ . Of course, this implies that the event $\overline{A}$ occurred $N-N_A$ times. The probability of an event is defined as the number of occurrences of this event divided by the number of outcomes:

$P(A) = \frac{N_A}{N} \qquad P(\overline{A}) = \frac{N-N_A}{N} = 1 - P(A)$

So, let’s assume that we performed the random experiment of our weather station 365 times. In other words, we started on day 1 (January 1st) and ended on day 365 (December 31st). Also, let’s assume that out of these 365 days, we had 97 days where the outcome (temperature) was less than 15 degrees Celsius. These are 97 occurrences of the event COLD, therefore:

$P(\text{COLD}) = \frac{97}{365} \qquad P(\overline{\text{COLD}}) = \frac{365-97}{365} = \frac{268}{365}$

3. Pairs of Events

Things get more interesting when we consider two events. In this case, we may ask questions like:

What is the probability of both events occurring?
What is the likelihood of either event occurring?
Knowing that one of the two events has happened, what is the probability of the other one occurring?

Consider the weather station experiment performed 365 times as described before. Let’s assume we recorded all outcomes in a table. Then, this table will contain the number of days (1 to 365) and the corresponding temperature.

Now, let’s define another experiment. It is also performed each day of the calendar year. In the new experiment, we note if the weather is sunny or cloudy. Then, we add this information to the same table where we recorded the temperatures. Thus, the table will contain 365 columns for each day of the year. Each column will therefore have two pieces of data. The first is the temperature at 2 p.m. on that day. The second is the weather condition (sunny or cloudy). Moreover, we define the following events in the new experiment:

Event CLOUD: weather is cloudy
Event SUN: weather is sunny

3.1. Graphical Representation of Multiple Events

Now, we defined three events over the first random experiment: COLD, MILD, and HOT. Also, we defined two events over the second experiment: CLOUD, SUN. Let’s consider a pair of these events:

In this case, we’re considering two random experiments. Therefore, an outcome in the sample space is actually a pair of outcomes. The first one indicates the measured temperature. The second one indicates whether it was sunny or cloudy.

Moreover, the outcomes are in 4 categories:

The event COLD occurred, but SUN did not occur (the area with diagonal lines).
The event SUN occurred, but CLOUD did not occur (the area with vertical lines).
Both events occurred (the gray area).
Neither event occurred (the white area).

3.2. Joint Probability

The joint probability of two events $A$ and $B$ is the probability that both events occur. It is represented as the gray area in the previous figure. It is written as $P(A,B)$ , $P(A \cap B)$ , or $P(A~ \text{and} ~B)$ . To demonstrate, let’s show a part of the table where we recorded temperatures and the weather condition.

$\begin{tabular}{|l|l|l|l|l|l|l|l|l|} \hline Day & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8\\ \hline \hline Temperature & COLD & COLD & HOT & MILD & MILD & HOT & COLD & COLD \\ \hline Condition & CLOUD & SUN & SUN & CLOUD & SUN & SUN & CLOUD & CLOUD \\ \hline \hline \end{tabular}$

The joint probability $P(\text{COLD}, \text{SUN})$ is computed as:

$P(\text{COLD}, \text{SUN}) = \frac{N_{\text{COLD}, \text{SUN}}}{N}$

$N_{\text{COLD}, \text{SUN}}$ is the number of times both events occurred, and $N$ is the total number of outcomes. So, to compute $P(\text{COLD}, \text{SUN})$ , we’ll be counting the table columns where we have both COLD and SUN. Then, we’ll divide this number by the total number of columns.

3.3. Marginal Probabilities

Marginal probabilities are those of individual events. So, knowing $P(A,B)$ , $P(A)$ and $P(B)$ are the marginals. Now, let’s assume we used our 365-column table to compute joint probabilities. We already know how to do this. Then, we’ll end up with the table below.

$\begin{tabular}{|l|l|l|l|l} \cline{2-4} \multicolumn{1}{l|}{} & COLD & MILD & HOT & \\ \cline{1-4} SUN & 0.1 & 0.15 & 0.2 & 0.45\\ \cline{1-4} CLOUD & 0.3 & 0.15 & 0.1 & 0.55 \\ \cline{1-4} \multicolumn{1}{l}{} & \multicolumn{1}{l}{0.4} & \multicolumn{1}{l}{0.3} & \multicolumn{1}{l}{0.3} & \multicolumn{1}{l}{}\\ \end{tabular}$

We note that the sum of all probabilities is 1. To compute marginal probabilities, we use the rule:

$P(A) = \sum P(A,X)$

This means that to compute $P(A)$ , we sum all joint probabilities where $A$ occurs. So, we have the following marginal probabilities:

$P(\text{COLD}) = P(\text{COLD}, \text{SUN}) + P(\text{COLD}, \text{CLOUD}) = 0.4$
$P(\text{MILD}) = P(\text{MILD}, \text{SUN}) + P(\text{MILD}, \text{CLOUD}) = 0.3$
$P(\text{HOT}) = P(\text{HOT}, \text{SUN}) + P(\text{HOT}, \text{CLOUD}) = 0.3$

Furthermore, we can compute $P(\text{SUN}) = 0.45$ and $P(\text{CLOUD}) = 0.55$ .

They’re called marginals since they appear at the margins of the table of joint probabilities. This is shown in the previous table.

3.4. Conditional Probability

Let’s assume we have two events, $A$ and $B$ . The conditional probability of $A$ given $B$ is written as $P(A|B)$ . It is the probability of A’s occurrence, assuming that B occurred. In other words, we focus only on the outcomes where $B$ occurs. Using these, we want to know the probability of $A$ ‘s occurrence. So, we do not count occurrences of $A$ in all outcomes. Instead, we count occurrences of $A$ in outcomes where $B$ also occurred.

$P(A|B) = \frac{N_{A,B}}{N_B} = \frac{\text{Number of occurrences of both} ~A ~\text{and}~ B}{\text{Number of occurrences of} ~B} = \frac{N_{A,B}/N}{N_B/N} = \frac{P(A,B)}{P(B)}$

In the equation above, $N$ is the number of outcomes. This equation enables us to compute conditionals using counts or using probabilities. Now, let’s take a look at our small table above. To compute $P(\text{COLD}|\text{SUN})$ , we’ll count the number of occurrences of SUN. This is the count $N_{\text{SUN}} = 4$ . Then, out of these occurrences, we’ll count the number of COLD. This is the count $N_{\text{COLD},\text{SUN}} = 1$ . Now we can compute $P(\text{COLD}|\text{SUN})$ :

$P(\text{COLD}|\text{SUN}) = \frac{N_{\text{COLD},\text{SUN}}}{N_{\text{SUN}}} = \frac{1}{4}$

Moreover, we expect to have $P(\text{COLD}|\text{SUN}) < P(\text{COLD}|\text{CLOUD})$ . Indeed, cloudy days are usually colder than sunny ones.

4. More Insights

The ideas we have seen so far are applicable in many situations. However, we need to understand how to interpret the values of probabilities.

4.1. An Example

For instance, let’s consider a study done by some car manufacturers. They’re interested in the reliability of some car models. In this case, they may categorize used cars according to mileage. For instance, they may have four categories:

Lightly used: less than 10000 km
Moderately used: between 10000 and 50000 km
Heavily used: between 50000 and 100000 km
Excessively used: more than 100000 km

Then, they may contact service centers. Their purpose is to gather information about car problems. Subsequently, they categorize these problems as:

Simple: car produces warnings but still runs
Average: car breaks down, but the engine is fine
Extreme: car breaks down, and the engine needs an overhaul

Thus, they may end up with data of, let’s say, 20000 car problems.

4.2. Values of Probabilities and Their Interpretations

The company can now compute all sorts of probabilities. For instance, let’s assume that the marginal probability $P(\text{Extreme})$ is high, e.g., equals 0.3. Then, they know that a car of this model has a 30% chance of needing an engine overhaul. Of course, we expect $P(\text{Extreme}| \text{Excessively used})$ to be larger than $P(\text{Extreme}| \text{Lightly used})$ . Obviously, a heavily used car is subject to more wear and tear than a lightly used one.

In this respect, it is important to understand the difference between $P(\text{Extreme}, \text{Excessively used})$ and $P(\text{Extreme}|\text{Excessively used})$ . The former depends on the number of all cars in circulation. Therefore, it does not accurately indicate the extreme problems in excessively used cars. For instance, if the number of excessively used cars is a low percentage of all cars in circulation, this value may be low. On the other hand, in the latter, we focus only on excessively used cars. It, therefore, gives us an indication of their extreme problems.

5. Conclusion

In this article, we have explained ideas about marginal, joint, and conditional probabilities. We have also shown their use in practical scenarios. However, one point needs more clarification. We defined probabilities as the number of occurrences of an event divided by the total number of outcomes. This definition is fine as long as we know that we need to have many outcomes. Intuitively, we cannot infer meaningful statistics from a small number of outcomes.

Full Archive

About Baeldung

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex