Authors Top

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.


1. Overview

Heapsort is an elegant and robust algorithm for sorting. It’s optimal since it takes O(n\log n) time to sort n items, which is the best we can do with comparison-based sorting algorithms.

In this tutorial, we’ll intuitively explain Heapsort’s essential steps and the algorithm.

2. Heaps

A heap is a special kind of binary tree with the following properties:

  1. It’s a complete binary tree
  2. The value at each node is greater than or equal to the values of its child nodes

Here’s an example:

A 10 node heap

Given any complete binary tree, we can convert it into a heap using the procedure max-heapify in O(n) time.

A complete binary tree of n nodes can be represented compactly by an array A of size n. The children of a node at the ith position are A[2i+1] and A[2i+2]. For example, the children of node A[1] are A[3] and A[4]. This is how we store the above heap in an array:

A 10 node heap represented by an array

3. Sorting with Heaps

Let’s see how to sort this array with Heapsort. We’ll first show how to move the maximum element to its correct position.

3.1. Placing the Maximum at the Right Position

A heap’s maximum element is the one at position 0. So, if we want to sort our array in non-decreasing order, we need to put its maximal element at the end. So, we swap A[0] with A[n].

Our job isn’t done since the remaining elements A[0] \cdots A[n-1] are unsorted. If we found the maximum of the subarray A[0] \cdots A[n-1] and placed it at the second-to-last position, we would have the last two elements in their correct positions. So, we can repeat these steps until getting a fully sorted array.

For instance, here’s how the first step unfolds with our previous example:

Swapping the first and last elements in a heap

However, there’s a problem! When we put 11 at position 0, we violated the heap property since 11 is not greater than its children 42 and 30. Therefore, we need to correct the heap so that the new node 0 does indeed contain the next maximum element (which is 42).

3.2. Correcting the Heap With Sift-Down

To correct the error caused by replacing the root node, we use the procedure Sift-Down. We start at the root and work our way down to a leaf node. Our objective is to look at a node and its two children and swap the node with whichever child is greater (if any):

Rendered by QuickLaTeX.com

Sift-Down starts at the root of a subtree, indicated by start, and ends at stop. During each pass, it checks if either of the two children of the root are more significant than the root. If so, it swaps the root and the greater child and sets start to the former location of the greater child.

If neither of the two children has a value greater than their parent, the procedure proceeds with the left child.

3.3. An Example of Sift-Down

In the following diagrams, blue triangles denote node comparisons to their children.

After placing the maximal element at its position in the above example, we made node one greater than node 0. So, we interchange their values, as shown by the double-headed blue arrow:

Comparing 0, 1 & 2. Swapping 0 & 1.

Now, we compare node 1 to its children, nodes 3 and 4. We exchange the values in nodes 1 and 3 since node 3 is the greater child:

Comparing 1, 3 & 4. Swapping 1 & 3.

In the final step, we check nodes 3, 7, and 8. No swapping is required here, as 11 is already greater than 2 or 5:

Comparing 3, 7 & 8. No swap in this case.

We’ve now obtained a new heap of 9 elements. This heap is correct, as each node has a value greater than its two children.

A new heap of size 9. The last node is a sorted array of size 1.

Our next step is to move the 2^{\textrm{nd}} largest element (42) to its correct place:

The second largest element is placed in A[8].

That results in a sorted array of size 2 (purple) and an unsorted array of size 8. To re-heapify the remainder, we apply Sift-Down to the array A[0]\cdots A[7]. Afterward, we repeat the process until we sort the entire array.

3.5. Completing the Sorting Process

We’ll illustrate the remaining process using arrays. Green squares denote the heap, which shrinks in size from n to 0, while purple squares indicate the sorted array, which increases in size to n. The arrows indicate the transfer of the max element to its proper location. We’ve omitted other entries for clarity:

Heapsort: example with the array representation

3.6. Heapsort – the Complete Pseudocode

Here’s the complete pseudocode of Heapsort:

Rendered by QuickLaTeX.com

The algorithm repeatedly removes the maximal elements and re-heapifies the remainder with Sift-Down.

4. Complexity of Heapsort

We start our Heapsort by creating a heap using max-heapify with complexity O(n).

After this, we run Sift-Down n times. This procedure moves from the tree’s root to a leaf node and takes at most O(\log n) steps since we deal with complete binary trees, which have at most \log n levels. So, the total time for the sifts is O(n\log n).

The time for max-heapify, O(n), is masked by the time for the sifts. Therefore, the complexity of Heapsort is \boldsymbol{O(n\log n)}.

5. Heapsort vs. Selection Sort

It’s interesting to compare Heapsort with Selection Sort. Both pick and move the maximum element to its proper place at each step.

However, Selection Sort makes n-i comparisons in the ith iteration, which results in O(n^2) complexity. In the worst case, Heapsort makes O(\log n) comparisons and swaps per iteration, which is why its complexity is O(n \log n).

Heapsort uses the clever heap approach to find and place the maximal element, which is why we can consider Heapsort a refinement of Selection Sort.

6. Conclusion

In this article, we explained the Heapsort algorithm for sorting an array in non-descending order. It uses the fact that a Max-Heap’s maximum element is at its root.

It’s simple to modify this algorithm to use a Min-Heap, leading to sorting in non-ascending order. The complexity of Heapsort is O(n\log n), which is the best possible for sorting algorithms that use comparisons.

Authors Bottom

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!