In this article, we’ll discuss the Lowest Common Ancestors problem. We’ll start with the basic definition, then go through some of the methods used to find the lowest common ancestor of two nodes in a rooted tree.
The Lowest Common Ancestor (LCA) of two nodes and in a rooted tree is the lowest (deepest) node that is an ancestor of both and .
Remember that an ancestor of a node in a rooted tree is any node that lies on the path from the root to (including ).
For example, let’s look at the following tree, which is rooted at node 1:
Let’s find the LCA of the two highlighted nodes 7 and 11. The common ancestors of these two nodes are nodes 1 and 2. Since node 2 is deeper than node 1, .
3. Naive Approach
3.1. Theoretical Idea
The idea of this approach is to have two pointers, one on each node. We need to move these pointers towards the root in a way that allows them to meet at the LCA.
The first thing we can notice is that these pointers should be in the same depth. In addition to that, the deeper pointer can never be the LCA. Therefore, our first step should be to keep moving the deeper pointer to its parent until the two pointers are on the same depth.
Now, we have two pointers that are on the same depth. We can keep moving both pointers towards the root one step at a time until they meet at one node. Since this node is the deepest node that our pointers can meet at, therefore, it’s the deepest common ancestor of our starting nodes, which is the LCA.
In order to implement this solution, we will need to calculate the depth and the parent of each node. This can be done with a simple DFS traversal starting from the root.
Firstly, we set the current node as visited. Secondly, we set the parent and depth of the child node before calling the function recursively for it.
The complexity of the preprocessing step in this approach is , where is the number of nodes. The complexity is considered efficient because we only need to apply it once.
3.3. Finding the LCA
After calling the previous function starting from the root, we will have the depth and parent of each node. We can now apply the approach that we discussed earlier.
First of all, we keep moving the deeper node to its parent until both nodes have the same depth. After that, we move both nodes to their parents until they meet.
Although the approach is considered fairly simple, however, finding the LCA of a pair of nodes also has a complexity of , where is the height of the tree. This complexity can go as far as becoming in case of an almost linear tree (consider two long chains connected at the root).
4. Sparse Table Background
In order to enhance our approach discussed in section 3, we need to present some background on the sparse table data structure. The sparse table has to be built first. Then, it can then answer minimum range queries in a low complexity.
We’ll discuss the main idea of the sparse table. Later, we’ll update it to fit our needs.
4.1. Building the Sparse Table
A sparse table is a data structure capable of answering range queries on static arrays in with preprocessing. The intuition behind this data structure is that any range can be represented as a concatenation of at most ranges whose lengths are powers of two. This is equivalent to representing the length of the range in binary. For example, a range of length can be represented as the concatenation of three ranges with lengths 8, 4, and 1.
Let’s discuss a sparse table used to answer minimum range queries in an array . The sparse table will consist of a array of size . will store the minimum value in the range that starts at the element of the array and has a length of .
Let’s fill this array in increasing order of :
- is simply equal to .
- Let’s calculate for : We know that this value is the minimum value in the range with length starting at index . We can break this range into two equal parts, each of length , one starting at , and the other one starting at . This means that is equal to the minimum value between and .
Take a look at the implementation of the algorithm.
The complexity of building the sparse table is , where is the number of elements.
4.2. Querying the Sparse Table
Now that we have the sparse table built, let’s look at how we can answer range minimum queries using it. One way to convert a number into binary is to go through the powers of two in decreasing order. If the current power is less than or equal to , activate the corresponding bit and subtract it from . We’ll be doing something similar. Let’s say we want to find the minimum value in the range .
First, our current location is at . We want to find the largest such that (longest range whose length is a power of 2 and contained in ). After finding this value, we can get the minimum value in this range in since it is already stored in .
Next, we have to find the minimum value in the range in the same way. We keep doing this until we end up with an empty interval. The values of will be strictly decreasing, so one loop through values of in decreasing order will be enough to calculate the answer.
Let’s have a look at the described algorithm.
The complexity of querying the sparse table is , where is the number of elements.
Now that we have some background on how sparse tables work, we can take a look at a more efficient algorithm to find the LCA.
5. Binary Lifting Approach
5.1. Theoretical Idea
Let’s build a structure similar to a sparse table, but instead of storing the minimum/maximum value in a range whose length is a power of two, it will store the ancestor of each node for every .
In other words, if we had a pointer to some node , and moved this pointer to its parent times, we will end up at node . Let’s call this a jump from .
Before we start building this structure, we need an array that stores the immediate parent of each node.
Building the structure is similar to building a normal sparse table:
- To find the value of for , we need to do a jump from . However, we already know that if we do a jump from , we will end up at node . We also know that if we do a jump from , we will end up at node . Since two consecutive jumps are equivalent to a jump, this means that .
Now, let’s see how we can use the sparse table structure to optimize the naive approach. The naive approach consisted of two main steps.
The first step was to move the deeper pointer to its parent until both pointers are on the same depth.
Let’s assume that the deeper pointer is at node , and the other one is at node . We need to move the pointer a known amount of steps which is equal to . If we convert into binary, and do jumps for every such that the bit is activated, we would end up with a jump. This jump would put this pointer on the same depth as the other pointer.
Since each jump is and we need to check types of jumps, this whole step has a complexity of .
The second step was to move both pointers simultaneously until they meet at some node.
The main idea here is to try a huge jump at first. If doing a huge jump on both pointers leads them to the same node, then we have arrived at a common ancestor. However, we may have jumped too far. There could be some deeper common ancestor. So we can’t do this jump, let’s try something smaller.
If the current jump leads them to different nodes, then we should do this jump and update the pointers. This basically means that we made the largest possible jump that doesn’t reach any common ancestor yet. Therefore, the parent of both pointers will be the deepest common ancestor, which is the LCA. During this step, we also consider different jumps, so the complexity is also .
The complexity of this algorithm is , where is the number of nodes.
5.4. Binary Lifting Algorithm Steps
To recap, finding the LCA with this approach consists of two main parts:
- Preprocessing: This part will only be applied once on the tree, and has two steps:
- Find the depth and the parent of each node (Algorithm 1).
- Build the sparse table structure (Algorithm 5).
The complexity of this part is due to the complexity of building the sparse table.
- Answer LCA Query: This part will be applied for each pair of nodes that we want to find the LCA for (Algorithm 6).
The complexity of this part is since we are only doing jumps.
Let’s have a final comparison between the two approaches in this tutorial.
The main bonus when using the binary lifting approach is that we can handle a large number of queries. First, we have to apply the preprocessing once, and then we can have a large number of queries, where each query is answered in . Therefore, each query will be answered in a better complexity than when using the naive approach.
In this tutorial, we’ve discussed the Least Common Ancestors problem for tree data structures.
First, we took a look at the definition, then discussed two methods of finding the LCA. Finally, we did a simple comparison between the two methods.