Shuffling an Array | Baeldung on Computer Science

1. Introduction

In this tutorial, we’ll learn how to shuffle an array.

2. Shuffling

In many applications, we have an array $a = [a_1, a_2, \ldots, a_n]$ and need to shuffle it. For example, that’s the case when doing permutation tests to check feature importance in machine learning. Shuffling the array $a$ means getting a random permutation of its elements.

We require all permutations to be equally likely.

3. Naïve Approach

We could first try something like this. We could iterate over $i=1,2,\ldots, n$ , and for each $i$ randomly choose an index $j \in \{0, 1, \ldots, n\}$ and swap $a_i$ with $a_j$ :

algorithm NaiveShuffling(a):
    // INPUT
    //    a = an array [a_1, a_2, ..., a_n]
    // OUTPUT
    //    A random permutation of a

    for i <- 1 to n:
        j <- a random number from {1, 2, ..., n}
        Swap a[i] and a[j]

    return a

This may come as an intuitive thing to do, but the issue is that not all permutations are equally likely.

In each of the $n$ iterations, we can choose any of the $n$ elements of $a$ . So, there are $n^n$ equally likely executions but only $n!=n \cdot (n-1) \cdot \ldots \cdot 2 \cdot 1$ different permutations. Since $n$ isn’t divisible by $n-1$ for any $n > 2$ , we can’t spread out $n^n$ executions over $n!$ permutations evenly. So, some permutations are more likely than others.

3.1. Example

Here are all possible executions when the input is $[1, 2, 3]$ :

Shuffle [1, 2, 3] with the naive algorithm.

As we see, permutations $[1, 3, 2], [2, 1, 3], [2, 3, 1]$ occur with the probability $\frac{5}{27}$ each, whereas the rest with $\frac{4}{27}$ .

4. The Fisher-Yates Algorithm

Instead, we can use the Fisher-Yates algorithm. For each $i = 1, 2, \ldots, n-1$ , it randomly selects an index $j$ from $\{ i, i+1, \ldots, n\}$ to make a swap:

algorithm FisherYatesAlgorithm(a):
    // INPUT
    //    a = an array [a_1, a_2, ..., a_n]
    // OUTPUT
    //    A random permutation of a

    for i <- 1 to n - 1:
        j <- a random number from {i, i + 1, ..., n}
        Swap a[i] and a[j]

    return a

Basically, whenever we choose $a_j$ to swap with the current element $a_i$ , we discard it from participating in the swaps with $a_{i+1}, a_{i+2}, \ldots, a_{n}$ . We achieve that by restricting the range for $j$ .

Therefore, the Fisher-Yates algorithm splits the array into two parts: $a[1:(i-1)]$ and $a[i:n]$ . We’ll call them the inactive and active parts. In the $i$ th iteration, the algorithm can choose only active elements for the exchange with $a_i$ . Afterward, the inactive part grows to include the element at the $i$ th position.

So, for each $i=1,2,\ldots,n-1$ , there are $n-i+1$ active elements to choose from. As a result, the total number of possible executions is:

$n \codt (n-1) \cdot (n-2) \cdot \ldots \cdot 2 = n!$

which matches the number of permutations. Since we can choose an element for a swap only once, there is only one way to get each permutation. Consequently, they’re all equally likely and occur with the probability of $\frac{1}{n!}$ .

4.1. Example

Let’s say $a = [1, 2, 3, 4]$ . Here’s a possible execution where | denotes the split:

$\begin{matrix} | & 1 & 2 & 3 & 4 \\ 2 & | & 1 & 3 & 4 \\ 2 & 4 & | & 3 & 1 \\ 2 & 4 & 1 & | & 3 \end{matrix}$

In the last step, there’s only one possibility for $a_4$ , so we stop and return $[2, 4, 1, 3]$ .

4.2. Potential Issues

Random integer generation is at the core of the Fisher-Yates algorithm. If we can’t sample from a truly uniform distribution over $\boldsymbol{\{i, i+1, \ldots, n\}}$ , the algorithm will be biased toward some permutations. That can happen in two ways.

Let’s say we have a random number generator $rand$ that returns integers from $0$ to $N-1$ inclusive ( $N$ is usually the maximal integer). We can get $j \in \{i, i+1, \ldots, n\}$ like this:

$i + \left(rand() \bmod (n - i + 1) \right)$

For all the remainders to be equally likely, $N$ must be divisible by each possible $n-i+1$ , which we can’t assume to be the case. As some remainders are more likely than others, not all $j$ s we compute with the above formula are equally probable. However, the higher the $N$ , the less the discrepancy.

On the other hand, if we have a generator that returns random numbers from $[0, 1]$ , we can get $j$ like this:

$i + round\left( rand() \cdot (n - i) \right)$

Since computers have finite precision, $rand()$ can’t return all real numbers from $[0, 1]$ , so rounding won’t result in equally probable $0, 1, \ldots, n-i$ . Adding $i$ to the result of rounding does give us integers from the desired range, but not all are equally likely.

5. Conclusion

In this article, we showed how to shuffle an array.

The naïve approach is intuitive but incorrect since not all permutations are equally likely. The Fisher-Yates algorithm shuffles an array and returns each permutation with the same probability.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung