The Importance of Central Limit Theorem | Baeldung on Computer Science

1. Overview

In this tutorial, we’ll review the Central Limit Theorem (CLT), one of the most important results in probability theory.

First, we’ll explain the formal statement of the theorem, and discuss the related implications in the real world. Then we’ll illustrate an example of the Galton Board in order to better understand the CLT.

2. Formal Statement and Significance of CLT

Let’s consider $n$ independent random variables, $x_k$ ( $k=1,\cdots,n$ ), with means, $\mu_k$ , and variances, $\sigma^2_k$ . The CLT states that as $n \rightarrow \infty$ , the sum $\sum_k x_k$ becomes a Gaussian random variable with mean $\sum_k\mu_k$ and variance $\sum_k\sigma^2_k$ . This holds even if the original variables, $x_k$ , aren’t normally distributed.

More simply, the CLT states that the sum of a large number of independent random variables, under fairly general conditions, is normally distributed.

The importance of the CLT stems from the fact that, in several real applications, a random variable is the sum of a large number of independent random variables. Thus, the CLT explains why the Gaussian probability distribution is observed so commonly in nature:

In experimental physics, measurement errors are usually modelled by normal random variables.
In signal processing, the noise is modelled as Gaussian noise.
The cross-sectional intensity of a laser beam follows a Gaussian distribution.

Another important consequence of CLT is observed when random sampling is performed. If we randomly pick a sufficiently large sample from a population, the sample mean will be normally distributed regardless of the distribution of the original population.

3. Assumptions Behind CLT

The CLT can be applied if the following conditions are met:

The random variables should be independent of each other.
The number of random variables, $\mathbf{n}$ , should be sufficiently large and each contribution, $\mathbf{x_k}$ , within the sum should be small. How large $n$ should depend on the distribution of the variables, $x_k$ . In general, a sample size of 30 is considered sufficient when the overall distribution of the random variables, $\mathbf{x_k}$ , is symmetric.
When the random sampling is done without replacement, the sample size, $\mathbf{n}$ , should be no larger than 10% of the population.

4. Example: the Galton Board

The Galton board, also known as the bean machine or quincunx, is a device invented by the English scientist Francis Galton to demonstrate the CLT.

The device consists of a vertical board with evenly spaced nails arranged in staggered order and placed in the upper half of the board; the lower half contains a certain number of evenly-spaced slots (bins). Glass covers the front of the device, allowing us to view the nails and slots. A funnel is placed in the middle of the upper edge.

The Galton board is schematically represented in the following figure:

A large number of balls are poured into the funnel and they fall through. Each time a ball hits a nail, it can bounce right or left with equal probability, since the nails are placed symmetrically. Finally, the balls are collected into the slots at the bottom. The filling of the slots closely approximates a bell curve.

Why is this bell distribution observed? The final position of each ball is a sum of random displacements to the right or left. Hence, the final position can be represented as a sum of discrete random variables, $x_k$ , with only two allowed values: 1 and -1, representing a bounce to the right or left, respectively.

The filling of the central bins is greater than the filling of the bins at the edges because there are more paths reaching the central bins than paths to the rightmost (or leftmost) bin.

It can be demonstrated that the filling of the slots follows a binomial distribution. If the number of rows and the number of balls are both sufficiently large, then the distribution approximates a Gaussian distribution according to the CLT.

5. Conclusion

In this article, we reviewed the Central Limit Theorem, a fundamental theorem of statistics. We explained the formal statement and the assumptions behind it. Then we discussed several real applications to highlight the importance of CLT. Finally, we examined the Galton board, a device invented ad hoc to demonstrate the CLT.

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex

Full Archive

About Baeldung

1. Overview

2. Formal Statement and Significance of CLT

3. Assumptions Behind CLT

4. Example: the Galton Board

5. Conclusion