Shortest Path to Certain Nodes in a Graph | Baeldung on Computer Science

1. Overview

In graph theory, we might have a modified version of the shortest path problem. One of the versions is to find the shortest path that visits certain nodes in a weighted graph.

In this tutorial, we’ll explain the problem and provide multiple solutions to it. In addition, we’ll provide a comparison between the provided solutions.

2. Defining the Problem

In this problem, we’re given a weighted, undirected graph. The task is to find the shortest path that starts from a source node $S$ and ends at a goal node $G$ . In addition, the problem requires that the resulting path visits certain nodes, in any order.

Let’s take the following graph as an example:

We’ll assume that the source node is $S$ and the goal node is $G$ . Also, the nodes that we must visit are $E$ and $F$ . We need to find the shortest path for this graph.

We can notice that the shortest path, without visiting the needed nodes, is $SCEG$ with a total cost of 11. However, since we need to visit nodes $E$ and $F$ , the chosen path is different. We choose the path $SCEDF$ with a total cost of 17.

Note that the path we chose is the shortest among all paths that start from $S$ , end at $G$ , and visit $E$ and $F$ nodes. For example, we have another path $SBEDFG$ , with a total cost of 19. However, we didn’t choose this path since it has a larger cost.

3. 2D Dijkstra Solution

3.1. Theoretical Idea

The first solution that we’ll present is a modification to Dijkstra’s algorithm. In our case, it’s not enough to only store the distance of a node from the source. Instead, we must consider the must-visit nodes.

Suppose we arrived at some node $x$ . We can notice that it matters which nodes we have already visited. However, not all nodes matter to us. Specifically, we care about which nodes we have visited among the must-visit nodes.

Therefore, we need to update the $distance$ array that holds the distance of each node from the source. Instead of being a $1D$ array, we need the array to be $2D$ . The first dimension will correspond, as before, to the node itself. However, the second dimension will represent a visited set that can tell us which nodes have already been visited.

Therefore, each node will have many shortest paths from the $source$ node. Each of these shortest paths is the shortest one when visiting a specific set of the needed nodes.

In the end, the final answer will be stored inside the cell of the $distance$ array that corresponds to the $goal$ node and the visited set contains all the nodes we have to visit.

3.2. Algorithm

Take a look at the algorithm of the described approach:

algorithm dijkstra2DSolution(source, goal, mustVisitNodes, m):
    // INPUT
    //    source = The source node
    //    goal = The goal node
    //    mustVisitNodes = nodes that we must visit
    //    m = The number of nodes that we must visit
    // OUTPUT
    //    Returns the shortest path from source to goal 
    //    that visits all the nodes in mustVisitNodes
    
    distance <- initialize a matrix with infinity values
    Q <- initialize an empty queue
    
    Q <- push the tuple (source node, empty visited set, zero cost) onto Q
    distance[source, 0] <- 0
    
    while Q is not empty:
        u <- get from Q the node with the lowest cost
        if u.cost != distance[u.node, u.visited_set]:
            continue
        for v in neighbors(u.node):
            new_visited_set <- u.visited_set
            if v in mustVisitNodes:
                vid <- the ID of the must visit node
                add vid to new_visited_set
            newCost <- u.cost + edgeCost(u.node, v)
            if newCost < distance[v, new_visited_set]:
                distance[v, new_visited_set] <- newCost
                Q <- push the tuple (v, new_visited_set, new cost)
    min_distance <- find the minimal distance and the visited set contains all required nodes
    return min_distance

As we can see, the $createNode$ function creates a new node with the given node id, visited set, and cost.

First, we initialize the $distance$ array with $\infty$ and push the source node to the queue. The source node has an empty visited set because we haven’t visited any nodes yet. Also, it has a cost equal to zero. In addition, we set the distance of the source node to zero.

Next, we perform multiple steps. In each step, we get the node with the lowest cost from the queue. If we’ve found a shorter path to this node, we simply continue to the next node.

Otherwise, we iterate over all the neighbors of node $u$ . For each neighbor, we calculate the new visited set. In case $v$ is not a must-visit node, the visited set stays the same. Otherwise, we set the bit corresponding to node $v$ to one.

After that, we calculate the new cost. If the new cost is better than the stored one, we update the $distance$ array and push $v$ to the queue.

Finally, we return the distance for the $goal$ node with the visited set that contains all required nodes.

The complexity of the described approach is $O(2^m(V + E \cdot log(V \cdot 2^m)))$ , where $V$ is the number of vertices, $E$ is the number of edges and $m$ is the number of nodes that we must visit.

4. Permutations Solution

4.1. Theoretical Idea

Another approach is to consider the important nodes. By important nodes, we mean the $source$ node, the $goal$ node, and the must-visit nodes. Firstly, we need to calculate the shortest path between each pair of the important nodes.

Secondly, we need to extract all the possible permutations for the must-visit nodes. By permutations, we mean all the possible orders for the must-visit nodes.

Thirdly, for each permutation, we check the cost of this option. Each option means starting from the $source$ node and visiting the must-visit nodes one by one until we reach the $goal$ node.

Between each pair of nodes, we need to use the shortest path. Therefore, we’ll use the calculated shortest paths to find the shortest path between any pair of important nodes.

4.2. Calculate Shortest Paths

To calculate the shortest paths, we have two options:

Using Dijkstra’s algorithm multiple times. Each time, we run Dijkstra’s algorithm starting from one of the important nodes. This is helpful when the number of edges in the graph is not too large. In other words, it’s helpful when $E$ is a lot smaller than $V^2$ .
Using the Floyd-Warshall algorithm. The Floyd-Warshall algorithm calculates the shortest path between all pairs of nodes inside a graph. This approach is helpful when we don’t have a large number of nodes. In other words, it’s helpful when the $V$ is rather small.

4.3. Algorithm

Let’s take a look at the implementation of the described approach.

algorithm calculateShortestPathsPermutations(source, goal, mustVisitNodes, m):
    // INPUT
    //    source = The source node
    //    goal = The goal node
    //    mustVisitNodes = nodes that we must visit
    //    m = The number of nodes that we must visit
    // OUTPUT
    //    Returns the shortest path from source to goal
    //    that visits all the nodes in mustVisitNodes

    permutations <- calculatePermutations(mustVisitNodes, m)
    answer <- infinity
    for permutation in permutations:
        currentAnswer <- 0
        prev <- source
        for u in permutation:
            currentAnswer <- currentAnswer + shortestPath(prev, u)
            prev <- u
        currentAnswer <- currentAnswer + shortestPath(prev, goal)
        if currentAnswer < answer:
            answer <- currentAnswer
    return answer

First, we calculate the shortest paths using one of the methods mentioned in the previous subsection. Next, we calculate all the possible permutations of must-visit nodes.

After that, we perform multiple steps. For each permutation, we iterate over all the nodes inside it and calculate the answer. Between every two consecutive nodes, we take the calculated shortest path. When we finish, we calculate the shortest path from the last node to the $goal$ node.

Finally, we check whether we were able to get a better answer. If so, we store the calculated answer. In the end, we just return the best-calculated answer.

The complexity of the described approach depends on the algorithm used to calculate the shortest paths. The complexity of checking all the permutations is $O(m \cdot m!)$ , where $m$ is the number of nodes that we must visit and $m!$ is the factorial of $m$ . The reason for this is that the number of different permutations is $m!$ and we need to iterate over each permutation to calculate its cost.

The complexity of using Dijkstra’s algorithm to find the shortest paths is $O(m \cdot (V + E \cdot log(V)))$ , which is good in case the graph doesn’t have a lot of edges.

The complexity of using the Floyd-Warshall algorithm is $O(V^3)$ , which is useful when the graph has a small number of nodes.

5. Comparison

Let’s have a quick comparison between the described approaches.

Algorithm	Time Complexity	Useful
2D Dijkstra	$O(2^m (V + E \cdot log(V 2^m)))$	Small value of $m$
Perm. with Dijkstras	$O( (m \cdot m!) + m \cdot (V + E \cdot log(V)))$	Small value of $E$
Perm. with Floyd-Warshall	$O((m \cdot m!) + V^3)$	Small value of $V$

As we can see, all algorithms calculate the needed shortest path.

However, the main idea is to find the algorithm that is able to calculate it with the lowest complexity.

The factors to consider when choosing an approach are $V$ , $E$ , and $m$ , which correspond to the number of nodes, the number of edges, and the number of must-visit nodes, respectively.

6. Conclusion

In this tutorial, we presented a modified version of the shortest path problem that must visit certain nodes. Firstly, we gave an example of the described problem. Secondly, we presented multiple solutions to solve the problem.

Finally, we showed a simple comparison between the discussed solutions and illustrated when to use each approach.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung