Finding Strongly Connected Components: Tarjan’s Algorithm

1. Introduction

In this topic, we’ll discuss Tarjan’s algorithm for finding strongly connected components (SCCs) in directed graphs. We can check out Kosaraju’s algorithm for the definition of SCCs to start.

2. An Example Graph

Let’s pick an example graph, $G$ , for our discussion:

$G$ is a directed graph with four SCCs. We’ve depicted the SCCs with different colors for visual comprehension:

Further, we’ll use this graph to demonstrate the ideas of Tarjan’s algorithm.

3. The Spanning Forest of DFS

Before digging into the algorithm itself, we need to introduce the notion of the DFS spanning forest. When DFS traverses a directed graph, it defines a set of non-intersecting trees. We’ll call this set the spanning forest of DFS.

We’ll also classify the edges of the graph depending on how DFS discovers them. If DFS processes vertex $v$ and it discovers:

An unvisited neighbour, $u$ , then edge ( $v$ , $u$ ) is a tree edge.
A visited but not yet fully processed neighbor, $w$ , then edge ( $v$ , $w$ ) is a back edge.
A fully processed neighbour, $t$ , then edge ( $v$ , $t$ ) is a cross edge.

Let’s classify the edges of the sample graph. We first run DFS for vertex $A$ . DFS visits vertices $A$ , $B$ , $C$ , $D$ , $E$ , $F$ and exits. Then, let’s run DFS for $G$ . Now, DFS visits the remaining vertices: $G$ , $H$ , $I$ , and $J$ . The tree edges are depicted with solid lines, the back edges with dashed lines, and the cross edges with dotted lines:

The spanning forest of the graph consists of two trees:

The first tree consists of the set of vertices $\{A, B, C, D, E, F\}$ and the set of edges $\{(A, B), (B, C), (B, D), (D, E), (E, F)\}$ .
The second tree consists of the set of vertices $\{G, H, I, J\}$ and the set of edges $\{(G, H), (H, I), (I, J)\}$ .

Note that depending on which vertex we start DFS for, the classification of edges may change. For example, some back edges may become tree edges and vice versa, and some cross edges may become tree edges. Thus, the tree set in the spanning forest may also change. Fortunately, that doesn’t affect Tarjan’s algorithm in any way.

4. Tarjan’s Algorithm

4.1. Observations

Tarjan’s algorithm makes use of the observation that SCCs can be built out of the trees in the spanning forest. A single tree in the spanning forest may contain several SCCs, but no SCC can belong to more than one tree. If an SCC belonged to more than one tree, then those trees would have been reachable from each other during DFS traversal, thus forming a single tree.

If each SCC exactly matched a tree in the spanning forest, the problem of finding SCCs would have been solved by running a simple DFS and identifying trees in the spanning forest. But that approach only works for finding connected components in undirected graphs. For directed graphs, a tree in the spanning forest may contain multiple SCCs.

Let’s pay attention to another observation that is used by the algorithm. Any SCC can be treated as a directed cycle because any two vertices in an SCC are reachable from each other. If we start DFS for any of the SCC vertices, there will be a moment when DFS sees a back edge to that vertex. A back edge actually identifies a cycle. Thus, when we see a back edge during DFS, we conclude that either we’ve found an SCC or a small cycle inside of a bigger cycle.

4.2. Algorithm Description

Tarjan’s algorithm defines arrays $\boldsymbol{num[]}$ and $\boldsymbol{lowest[]}$ , which help in classifying edges. They also help in identifying the starting vertex of an SCC. The algorithm also uses a stack to keep the current DFS tree’s vertices and correctly fetches the vertices of SCCs afterward.

The steps of the algorithm are described below:

Select an unvisited vertex, $v$ . If there’re no unvisited vertices, the algorithm terminates
Run DFS for $v$
Go to step 1

Inside DFS:

$v$ is marked as visited
$num[v]$ is initialized to be the current value of the counter
$lowest[v]$ is initially equal to $num[v]$
Next, we go over the $v$ neighbors. If we see an unvisited neighbor, $u$ , we invoke DFS for it and, upon returning, update $lowest[v]$ with $lowest[u]$ if $lowest[v] > lowest[u]$
If we see a visited but not processed neighbor, $w$ , then we have a back edge. In this case, we update $lowest[v]$ with $num[w]$ if $lowest[v] > num[w]$
After we process $v$ ‘s neighbours, we mark $v$ as processed
After $v$ is processed, we check if $num[v] = lowest[v]$ . If that’s the case, $v$ is the starting vertex of its component. We unwind the stack until we retrieve $v$ . The unwound vertices belong to the $v$ ‘s SCC

4.3. Running the Algorithm for the Example Graph

In our example, the white vertices are unvisited ones. The light grey are visited but not yet processed ones and the dark grey are fully processed vertices. The processed edges are colored red. Also, we depict the vertex stack in the lower right corner.

We start by running DFS for $A$ . The image below shows the state after DFS has visited $A$ , $B$ , and $C$ , but hasn’t yet processed back edge $(C, A)$ :

The image below shows the state when DFS has processed back edge $(C, A)$ , updated $lowest[C]$ , backtracked, and updated $lowest[B]$ . Next, DFS visited the remaining vertices reachable from $B$ :

Then, DFS backtracks from $F$ , and in $E$ DFS finds the first SCC = $\{E, F\}$ :

Next, DFS backtracks to $D$ , and the second SCC = $\{D\}$ is found:

Now, DFS backtracks to $B$ , then to $A$ , and in $A$ DFS finds the third SCC = $\{A, B, C\}$ :

The DFS invocation terminates at this point as no reachable vertices are left. Next, we run DFS for an unvisited vertex, $G$ . DFS visits $G$ , $H$ , $I$ , and $J$ , processes back edge ( $J$ , $G$ ), and updates $lowest[J]$ :

Then, DFS backtracks to $G$ and updates all the vertices on the way:

Finally, when processing $G$ , DFS finds the last SCC = $\{G, H, I, J\}$ .

5. The Pseudocode of the Algorithm

In this section, we’ll implement Tarjan’s algorithm. We’re using a number of variables needed by the algorithm. Note that we could have added all those variables as parameters to the DFS procedure. But let’s keep the DFS implementation simple and have all the auxiliary variables as global data. Here’re all the additional variables used by the algorithm:

$\boldsymbol{i}$ – a counter used to assign sequential numbers to the vertices
$\boldsymbol{num[]}$ – an array of integers holding the vertice numbers, $num[v]$ is the number assigned to $v$
$\boldsymbol{lowest[]}$ – an array of integers holding the minimum reachable vertex numbers, $lowest[v]$ is the minimum number of a vertex reachable from $v$
$\boldsymbol{visited[]}$ – an array of booleans indicating which vertices have been visited by DFS so far. If $visited[v]$ is TRUE, then DFS has already seen $v$ , but it hasn’t necessarily finished processing $v$
$\boldsymbol{processed[]}$ – an array of booleans indicating which vertices have been already processed by DFS. If $processed[v]$ is TRUE, then DFS has already finished with $v$
$\boldsymbol{s}$ – a stack of vertices used to keep the working set of vertices. $s$ holds all the vertices reachable from the starting vertex. When the algorithm finds an SCC, it will unwind the stack until it gets all the vertices of that SCC

Tarjan’s algorithm now takes the form of a series of DFS invocations:

6. The Complexity Analysis

Tarjan’s algorithm is a modification of the DFS traversal. So, the complexity of the algorithm is linear: $\boldsymbol{O(N + M)}$ , where $\boldsymbol{N}$ is the number of vertices and $\boldsymbol{M}$ is the number of edges. Please note that to achieve the mentioned complexity, we must use the adjacency list representation of the graph.

7. Conclusion

In this topic, we’ve discussed Tarjan’s algorithm for finding strongly connected components in directed graphs. It’s an optimal linear time algorithm.

Furthermore, it’s easy to implement as it simply modifies the standard DFS traversal.

Full Archive

About Baeldung

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex