Find the Kth Smallest Element in a Binary Search Tree

1. Introduction

In this tutorial, we’ll show how to find the $k$ -th smallest element in a binary search tree.

2. The $k$ -th Smallest Element

Let’s suppose that a binary search tree $T$ has $n$ elements: $a_1, a_2, \ldots, a_n$ . We want to find the $k$ -th smallest. Formally, if $a_{(1)} \leq a_{(2)} \leq \ldots \leq a_{(n)}$ is the sorted permutation of $a_1, a_2, \ldots, a_n$ , we want to find $a_{(k)}$ for the given $k \in \{1, 2, \ldots, n\}$ .

Let’s take this tree as an example:

It’s a binary search tree as each node’s value is lower than the values of its right descendants and greater than or equal to those in its left sub-tree. For instance, let $k=5$ . Here’s the fifth smallest number in the tree:

3. Finding the $\boldsymbol{k}$ -th Smallest Element With the In-Order Traversal

Since the in-order traversal of a tree outputs its elements in sorted order, we could run the traversal until we get $k$ elements. At that point, we can stop because we’ve got the element we wanted:

algorithm findKthSmallestElementInOrder(T, k):
    // INPUT
    //    T = the tree to traverse
    //    k = the rank of the element we want to find
    // OUTPUT
    //    The k-th smallest element in the tree or null if the tree has fewer than k nodes.

    m <- 0
    for x in TRAVERSE-IN-ORDER(T.root):
        m <- m + 1
        if m = k:
            return x
    
    return null

function TRAVERSE-IN-ORDER(x):
    if x.left != null:
        TRAVERSE-IN-ORDER(x.left)
    yield x
    if x.right != null:
        TRAVERSE-IN-ORDER(x.right)

We used the traversal procedure as an iterator, hence the keyword yield. Of course, a fully recursive algorithm would work too.

3.1. Complexity

This approach will work on every binary search tree but is of linear time complexity. For $k=n$ , the algorithm traverses the whole tree to get to the largest element.

4. Finding the $\boldsymbol{k}$ -th Smallest Element by Jumping Over Sub-Trees

We can find the $\boldsymbol{k}$ -th smallest element more efficiently if we store into each node the number of its descendants. For instance:

The additional information can speed up the (in-order) traversal. If we visit $m$ nodes before $x$ , and the $x$ ‘s left sub-tree has $\ell$ nodes, then we can skip traversing it if $m + \ell < k$ . Why? Because we can be sure that the $k$ -th smallest element isn’t in the sub-tree. So, it’s safe to “jump” over it. In doing so, we distinguish two cases. If $m + \ell = k - 1$ , then $x$ is the $k$ -th smallest element in the tree. If $m + \ell < k - 1$ , we can skip $x$ and go straight to the right sub-tree.

Conversely, if $m + \ell \geq k$ , the $k$ -th smallest element is certainly in the left sub-tree.

4.1. Pseudocode

Here’s the pseudocode:

algorithm JUMP-TRAVERSE(x, k, m):
    // INPUT
    //    x = the node we've reached (initially, the tree's root)
    //    m = the number of nodes we visited before calling the function with x as its argument
    //    k = the rank of the element we want to find
    // OUTPUT
    //    the k-th smallest element in the tree
    
    if x.left = null:
        l <- 0
    else:
        l <- x.left.size

    if m + l >= k and x.left != null:
        return JUMP-TRAVERSE(x.left, k, m)
    else if m + l = k - 1:
        return x
    else if m + l < k - 1 and x.right != NULL:
        return JUMP-TRAVERSE(x.right, k, m + l + 1)

At the start, we can test if $T.root.size < k$ to check if the element exists.

4.2. Example

To show the efficiency of the modified traversal, let’s check how many nodes it visits in the above tree before finding the fifth smallest element:

In contrast, the usual in-order search visits all the nodes in the left sub-tree we jumped over:

4.3. Complexity

In the worst case, the sub-trees we jump over are the smallest possible. This situation happens if $k=n$ and each node in the tree has only the right child. Such a tree is essentially a linked list. We jump over zero nodes at each step, traversing $n$ in total.

In general, each node whose child we jump over is on the path from the root to the node containing the sought value. So, this algorithm’s worst-case time complexity is $\boldsymbol{O(h)}$ , where $\boldsymbol{h}$ is the tree’s height, i.e., the longest path from the root to a leaf. If the tree is balanced, $h \in O(\log n)$ , so the algorithm has a logarithmic complexity. Without the additional information on the sizes of the sub-trees, the ordinary traversal would have an $O(n)$ runtime even if the tree was balanced.

However, since balancing trees adds computational overhead, this approach pays off only if look-ups are more frequent than insertions and deletions. The ideal use-case for this algorithm is the search tree we build once and don’t change afterward.

5. Conclusion

In this article, we showed how to find the $k$ -th smallest element in a binary search tree. We can use the usual in-order traversal, but it has an $O(n)$ complexity. Keeping the size of each sub-tree in its root allows for a more efficient approach if we use balanced trees.

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung