Authors Top

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.


1. Introduction

In this tutorial, we’ll explain how to find common elements in two sorted arrays.

2. Common Elements of Two Sorted Arrays

In this problem, we have two sorted arrays: a=[a_1 \leq a_2  \leq \ldots \leq a_n] and b=[b_1 \leq b_2 \leq \ldots \leq b_m]. Our task is to find the common elements.

For instance, if a = [1, 2, 3, 4, 5] and b = [3, 4, 5, 6, 7], our algorithm should output [3, 4, 5] as the result.

To find the common elements efficiently, it should use the fact that a and b are already sorted. Additionally, we’ll also require the output array to be non-descending as the input arrays.

We’ll also assume that a and b can contain duplicates. So, if a common element is repeated twice in a, and thrice in b, we’ll include it twice in the result array. For instance:

    \begin{equation*} \begin{matrix} a&=& [1& 2& 2& 4& 5] \\ b&=& [2& 2& 2& 3& 5& 7] \\ result&=&  [2& 2& 5]\\ \end{matrix} \end{equation*}

3. Finding Common Elements in Linear Time

Let’s start with the naive algorithm that doesn’t use the fact that a and b are sorted:

Rendered by QuickLaTeX.com

For each a_i, the algorithm iterates over the entire b to check if a_i \in b. So, it has an O(mn) time complexity, which implies O(n^2) if m and n are comparable. What can we change in this algorithm to get common elements faster?

Well, since the arrays are sorted, there’s no point in iterating over \boldsymbol{b_{j+1}, b_{j+2}, \ldots, b_{m}} if \boldsymbol{a_i < b_j}. Each of those is greater than a_i, so we can conclude that a_i \not \in b. The converse is also true: if b_j < a_i, we can discard b_j.

That way, we iterate over each array only once. So, we don’t need nested loops. Instead, we can start with i=1 and j=1. When a_i < b_j, we discard a_i by incrementing i. Similar goes for j if b_j < a_i. If a_i=b_j, we append a_i to the result array and update both counters. The loop stops once we reach the end of a or b. This approach is similar to the merge step in the Merge Sort algorithm:

Rendered by QuickLaTeX.com

As a result, the algorithm’s time complexity is linear: O(m + n).

3.2. Example

Let’s use the above example to show how the algorithm works.

Initially, c is empty, and both i and j point to the first elements:

Common elements - step 1

 

Since a_1=1 < a_2=2, we increment i:

Common elements - step 2

Now, we have a match, so we append 2 to c and increment both counters:

Common elements - step 3

We have a match again, so we do the same as in the previous step:

Common elements - step 4

Now, a_3=4 > b_3=2, so we discard b_3 and move on to b_4:

Common elements - step 5

The situation is the same (a_3=4 > b_4=3), so we increment only j:

Common elements - step 6

Since a_3=4 < b_5 = 5, we increment only i:

Common elements - step 7

We found a new common element, so we add it to c and increment both counters:

Common elements - step 8

Finally, i>n=5, so we stop and output the result: c=[2, 2, 5].

4. Finding Common Elements in Logarithmic Time

Let’s suppose that m << n. Then, for each \boldsymbol{j=1,2,\ldots,m}, we can look for \boldsymbol{b_j} in \boldsymbol{a} with the logarithmic binary search. Since b_j \leq b_{j+1}, if the binary search finds b_j at the k-th position in a, we can discard all the element to the left and search for b_{j+1} in the remainder: [a_{k+1}, a_{k+2}, \ldots, a_n].

Similarly, let’s say the binary search doesn’t find b_j in a, but the last index it checks in a is k. If b_{j+1} = b_j, we search for b_{j+1} in [a_{k+1}, a_{k+2}, \ldots, a_{n}]. On the other hand, if b_{j+1} > b_j, it can still happen that b_{j+1}=a_{k}, so we don’t discard a_k.

4.1. Pseudocode

Here’s how the above algorithm works:

Rendered by QuickLaTeX.com

The worst-case time complexity is O(m\log n). If m << n, we can consider m to be a constant with respect to n. In that case, the algorithm has an approximate \boldsymbol{O(\log n)} complexity. However, if m \in \Theta(n), we get an O(n \log n) algorithm. That’s still better than the naive approach but worse than the O(n+m) runtime of the two-side linear search.

5. Conclusion

In this article, we presented two efficient ways of finding common elements of two sorted arrays. One approach has an O(n+m) complexity, whereas the other combines binary and linear searches and runs in an O(m \log n) time.

Authors Bottom

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!