Counting Sort vs. Bucket Sort vs. Radix Sort | Baeldung on Computer Science

1. Introduction

When we look at sorting algorithms, we see that they can be divided into two main classes: those that use comparisons and those that count occurrences of elements.

In this tutorial, we’ll explore the latter one. More specifically, we’ll focus on comparing Counting, Bucket and Radix, sort. These algorithms typically take $O(n+k)$ time, where $n$ is the size of the array and $k$ is the size of the largest number in the array.

2. Counting Sort

We’ll assume that we are given an array $A$ with $n$ elements ranging in size from $1$ to $k$ . We can allocate a counting array $C$ of size $k$ and initialize this to all zeroes:

We then run through $A$ , and for each element $A[i]$ , increment the value in $C[A[i]]$ as illustrated in the following diagram:

Once this process is complete, each $C[i]$ contains the number of times $i$ occurs in $A$ . We can now copy the values from $C$ back into the array $A$ , ensuring that $i$ is copied into $A[i]$ , $C[i]$ times in succession, as shown:

2.1. Pseudocode

algorithm CountingSort(A, n):
    // INPUT
    //    A = an unsorted array of size n
    // OUTPUT
    //    Returns the sorted array A

    // find the maximum value k in A
    k <- A[1]

    for i <- 2 to n by 1:
        if A[i] > k:
            k <- A[i]

    C <- initialize the counting array containing k zeros

    // Fill the counting array
    for i <- 1 to n by 1:
        C[A[i]] <- C[A[i]] + 1

    // overwrite A using the values in C
    place <- 1

    for i <- 1 to k:
        if C[i] > 0:
            for j <- 1 to C[i]:
                A[place] <- i
                place <- place + 1

    return A

Let’s look into the complexity of this algorithm. The time required to find the maximum $k$ is $O(n)$ as it only requires running through $A$ once. The time for allocating and zeroing out $C$ is $O(k)$ . Finally, we’ll have to fill up the sorted array $A$ , and this takes $O(n+k)$ time, as we’ll have to look at each element of $C$ (time $O(k)$ ) and fill up $A$ appropriately (time $O(n)$ ). The total time is $O(n)+O(k)+O(n+k)=O(n+k)$ .

2.2. Critique

At first sight, this may look to us like a wonderful algorithm, but on careful examination, it has a number of drawbacks:

it can only work for integer arrays
it can require a huge counting array $C$ . For example, $A=\{4,5,1,7,1000000\}$ would require $C$ to be of size 1000000, just to sort a five-element array
huge counting arrays require correspondingly large run times: if the maximum element in the array is $O(n^3)$ , then the time required is $O(n+n^3)=O(n^3)$ , which is worse than insertion or selection sort

Let’s look at variants of this algorithm that have better time and space requirements.

3. Bucket Sort

The first variant we’ll look at is Bucket sort. This algorithm works well if the numbers of the input array are uniformly distributed over a range, say $1\cdots k$ . Of course, we can’t guarantee that the uniform distribution is completely satisfied, but we can come close to this condition. Here’s an example of a size 20 array whose elements are distributed over the range $1\cdots 100$ :

We can now create ten buckets to hold the elements of the array. These buckets hold elements $1-10, 11-20, \ldots, 91-100$ as shown:

Since each bucket holds a variable number of elements, our best choice is to associate a linked list with each bucket. The bucket $51-60$ , for example, holds the linked list $56\rightarrow 55 \rightarrow 59$ . The order of the elements is the order in which the elements $51\leq 60$ are encountered when sweeping array $A$ from left to right. Buckets marked with a diagonal line (for example, $31-40$ ) mean there are no elements in $A$ that fall into the corresponding range:

Now comes the interesting part: we can sort the individual linked lists using insertion sort. This results in:

In the figure above, we see that the buckets are in increasing order of magnitude. The linked list within each bucket is sorted.

We can now step through the buckets, copying the elements of each linked list into the original array $A$ . This results in a sorted array $A$ , as shown:

3.1. Pseudocode

algorithm BucketSort(A, n, k, b):
    // INPUT
    //    A = Unsorted array
    //    n = size of the array A
    //    k = maximum value in A
    //    b = number of buckets
    // OUTPUT
    //    Returns the sorted array A

    B <- allocate bucket array B and set each element to empty

    // Insert elements of A into buckets
    for i <- 1 to n:
        InsertList(A[i] / b, A[i])

    // Sort the linked lists
    for i <- 1 to b:
        SortList(B[i])

    // Overwrite A using the linked lists from B
    Loc <- 1
    for i <- 1 to b:
        listPlace <- B[i]
        while listPlace != empty:
            A[Loc] <- Value(B[listPlace])
            listPlace <- Next(B[listPlace])
            Loc <- Loc + 1

    return A

Let’s analyze the running time of Bucket sort. Assuming the array $A$ has $n$ elements, there are $b$ buckets, and the elements of $A$ are uniformly distributed, each linked list will be of size $n/b$ . The time complexity for this algorithm is then:

$O(b)$ to create the buckets, $O(n/b)$ to create each linked list–total time to create all linked lists is $O(b\times n/b)=O(n)$
$O((n/b)^2)$ to sort each linked list with insertion sort. Total time to sort all linked lists is $O(b\times (n/b)^2)=O(n^2/b)$
$O(n)$ time to copy from the linked lists back to the original array

The total time is dominated by the sorting time, which is $O(n^2/b)$ .

3.2. Critique

Let’s review the above analysis of Bucket sort. The major assumption is that the numbers are absolutely uniformly distributed across the buckets. This assumption is unlikely to be true in practice. We can gather that the expression $O(n^2/b)$ applies only to the uniformly distributed case.

In practice, the time will be $O(\ell^2)$ , where $\ell$ is the length of the longest list. In the extreme case, all elements of the array fall into one bucket, leading to $\ell=n$ and a run time of $O(n^2)$ . This means we would have been better off just using plain insertion sort.

At this point, we may be wondering if an algorithm better than insertion sort could be used to sort the lists and obtain a better time than $O(n^2)$ . Unfortunately, $O(n\log n)$ algorithms such as heapsort or mergesort cannot efficiently be used with linked lists, so we are stuck with insertion sort.

4. Radix Sort

In the case of Counting sort, we required time $O(n+k)$ , but $k$ was as large as the maximum number in the array to be sorted. This led to space and time complexity of $O(n+k)$ . For Bucket sort, we had a small number of buckets, but the linked list in a bucket could be as large as $n$ , leading to $O(n^2)$ runtime since insertion sort was required to sort the linked lists.

Can we find a compromise between the two? Indeed we can: Radix Sort uses a fixed number of buckets and repeatedly sorts the digits in each number in the array. Let us explore this powerful sorting algorithm.

In the following diagram, each number in the original array has four decimal digits 1513, 5314, 1216, and so on. Radix sort proceeds by starting with the least significant digit (LSD) and sorting the numbers in the array using this digit only, using Counting sort. Since we are dealing with decimal numbers, the count array $C$ need only be of size 10. We can see this in red in the following diagram, where the four-digit numbers are sorted on the least significant digit (which is digit 1).

We then repeat this process for digit 2 (blue), followed by digit 3 (green), and finally digit 4. We can see that the final array is correctly sorted:

4.1. Pseudocode

Radix sort uses Counting sort as a subroutine. In the following, $countingSort(A,d)$ means applying Counting sort to $A$ , using the $d$ th digit, as described above. In our context, the Counting sort is stable, which means that two numbers that match in the $d$ th digit are always placed in their original order. We can see in the figure above that 3480 and 2320 are placed in their original relative positions when sorting on digit 1. Stability of sorting is essential for Radix sort to work correctly:

algorithm RadixSort(A, d):
    // INPUT
    //    A = unsorted array of integers with at most d digits each
    // OUTPUT
    //    Returns the sorted array A

    for i <- 1 to d:
        A <- CountingSort(A, d)

    return A

If we are given an array of size $n$ , with each element being a decimal number with at most $d$ digits, we can see that the extra space required for the $C$ array is constant at 10. The time required is $O(nd)$ .

If $d$ is a fixed constant (e.g., zip codes or social security numbers), we can drop the $d$ and rightly claim that the Radix sort is $O(n)$ .

4.2. Extensions

Let’s think about what happens if the input array is not decimal. In the case of hexadecimal numbers, for example, the count array $C$ would be of length 16. This would make no difference in the asymptotic complexity, which would remain unchanged at $O(nd)$ .

A popular trick is to combine adjacent numbers into one so that a 6-digit decimal number $841567$ is treated as $84|15|67$ (each 2-digit tuple being considered a unique number). The count array has to be of size 100 as it has to hold numbers $0\cdots 99$ . However, we now have the overhead of initializing a ten times larger count array.

We can also think about the case where the magnitudes of numbers in the input array are very large (e.g., $2^{27}=134217728$ ). In this case, a very careful analysis is needed to choose between the Radix sort and more traditional comparison-based sorts. We do point out that we can use Radix sort on the binary representation of numbers. If the maximum bits in a number is 35, we can use 5 groups of 7 bits each, with the $C$ array having $2^7=128$ entries. This may well turn out to be better than using a comparison-based sort.

5. Conclusions

In this article, we’ve looked at three interesting sort algorithms that don’t use comparisons. Counting sort is simple and straightforward and is used as a subroutine for Radix sort. Bucket sort is an interesting algorithm but has the limitation of unequally sized linked lists. Radix sort is widely used and has many interesting variants, some of which we’ve discussed above.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung