Calculate the Number of Binary Search Trees with N Distinct Elements

1. Introduction

A binary tree is a hierarchical data structure in which each node has at most two children. Also, a binary search tree (BST) is a more specific type of binary tree where the main characteristic is that it’s sorted. Therefore, a binary tree is a BST if and only if an in-order traversal of the binary tree results in a sorted sequence.

This tutorial will show how to compute the number of binary search trees based on the number of tree nodes.

2. Unique Number of Binary Search Trees

In a BST, each node contains a sortable key. For example, we can label each node with an integer number. Therefore, the key in each node of a BST is greater than or equal to any key stored in the left subtree and less than or equal to any key stored in the right subtree.

Given a list of $n$ distinct numbers, we are interested in computing the number of BSTs labeled by these numbers. Without the loss of generality, we can assume the numbers are $1, 2, 3, ..., n$ . For example, there are 5 possible binary search trees for $1, 2, 3$ :

Although it is easy to enumerate all possible tree structures when $n$ is small, the number of unique BST structures increases very fast when $n$ gets bigger. For example, there are $16796$ unique BST structures when $n=10$ . Therefore, we need to compute this number with an algorithm. In this tutorial, we’ll show different approaches to compute this number.

3. Recursive Approach

Let $f(n)$ denote the total number of unique BST structures for number $n$ . For a list of $n$ unique numbers, we can choose any number as the root. For example, we can choose the number $i$ ( $i\in[1,n]$ ) as the root. Then, numbers in $[1, i-1]$ will be in the left subtree. Also, numbers in $[i+1, n]$ will be in the right subtree.

Since we have $(i-1)$ numbers in the left subtree, the left subtree has $f(i-1)$ unique BST structures. Similarly, the right subtree has $f(n-i)$ unique tree structures. Also, the arrangements in the left and right subtrees are independent. Therefore, we have a total of $f(i-1) \times f(n-i)$ unique tree structures when $i$ is the root.

Since we have $n$ possible ways to choose the root, we can add together $f(i-1) \times f(n-i)$ for all possible values of $i$ :

If we don’t have any nodes in one subtree, then we’ll only consider the total number of unique tree structures in the other subtree. Therefore, the base case of the formula is $f(0) = 1$ .

As a result, we can construct a recursive algorithm based on the above formula:

algorithm recursiveBSTCount(n):
    // INPUT  
    //    n = an integer representing the number of unique nodes
    // OUTPUT 
    //    The number of distinct binary search trees (BSTs) 
    //    that can be formed with n unique nodes.

    if n = 0:
        return 1
    
    result <- 0
    for i <- 1 to n:
        result <- result + recursiveBSTCount(i - 1) * recursiveBSTCount(n - i)
    
    return result

4. Dynamic Programming Approach

In general, the recursive algorithm is straightforward based on the recursive formula. However, it has many repeat computations. To avoid that, we can use a dynamic programming approach to compute the number.

We can first compute $f(n)$ with smaller numbers and store the results into an array. Then, when we compute bigger numbers, we can reuse the previously computed results from the smaller numbers:

algorithm dpBSTCount(n):
    // INPUT  
    //    n = An integer representing the number of unique nodes
    // OUTPUT 
    //    The number of distinct binary search trees (BSTs) 
    //    that can be formed with n unique nodes using dynamic programming.
    
    f[0] <- 1
    for i <- 1 to n:
        f[i] <- 0
        for j <- 1 to i:
            f[i] <- f[i] + f[j - 1] * f[i - j]
    
    return f[n]

In this algorithm, we use a loop to compute our number step by step. For a number $f(i)$ , we use another loop to compute its value based on our recursive formula.

For example, our base case is $f(0) = 1$ . Then, we compute $f(1)$ based on the value of $f(0)$ . After that, the next step is to compute $f(2)$ based on the values of $f(0)$ and $f(1)$ .

We repeat this process for each number from 1 to $n$ . Therefore, when we compute $f(i)$ , all the required values for the computation are already available. Eventually, we’ll compute $f(n)$ and return this value.

Since we have a nested loop to do the computation, the overall time complexity of this algorithm is $O(n^2)$ .

5. Catalan Number Approach

The number of different BSTs is actually a Catalan Number:

Therefore, we can compute our number $f(n)$ directly from the Catalan Number. However, we need to compute 3 factorial numbers based on this formula.

To avoid the time consuming factorial computations, we can also use the following recurrence property of Catalan Number in our algorithm:

As a result, the algorithm based on this recurrence is:

algorithm catalanBSTCount(n):
    // INPUT  
    //    n = an integer representing the number of unique nodes
    // OUTPUT 
    //    The number of distinct binary search trees (BSTs)
    //    that can be formed with n unique nodes, 
    //    calculated using the Catalan number formula.
    
    C <- 1
    for i <- 1 to (n - 1):
        C <- 2 * (2 * i + 1) * C / (i + 2)
    
    return C

Since we use only one loop to do the computation, the overall running time is $O(n)$ .

6. Conclusion

This tutorial showed a recursive formula to compute the number of unique binary search trees with $n$ nodes. Also, we provided several approaches to compute this number. As a result, the Catalan Number approach is the fastest algorithm among them to compute this number. However, the dynamic programming approach is more intuitive as it directly comes from the analysis of BST structures.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung