1. Overview

One of the most important things to understand in graph theory is how to store them in memory.

In this tutorial, we’ll explain and compare three main data structures for graphs and show their advantages and disadvantages.

2. Defining the Problem

In graph theory, we refer to nodes as vertices (V) and connections between nodes as edges (E). When dealing with graph storage data structures, the comparison is done based on space and time complexities.

2.1. Complexities

With graph storage data structures, we usually pay attention to the following complexities:

  • Space Complexity: the approximate amount of memory needed to store a graph in the chosen data structure
  • Time Complexity
    1. Connection Checking Complexity: the approximate amount of time needed to find whether two different nodes are neighbors or not
    2. Neighbors Finding Complexity: the approximate amount of time needed to find all the neighboring nodes of some goal node

We call two different nodes “neighboring nodes” if there’s an edge that connects the first node with the second.

2.2. Types of Graphs

One thing that needs to be understood is that graphs are usually defined based on two factors.

The first factor is whether the graph is weighted or not. In graph theory, we sometimes care only about the fact that two nodes are connected. Other times, we also care about the cost of moving from node u to node v.

In this article, we’ll show the update that needs to be done in both cases for each data structure.

The second factor is whether the graph is directed or not. If there’s an edge from u to v, and we can only move from node u to node v, then the graph is called directed. However, in undirected graphs, an edge between nodes u and v means that we can move from node u to node v and vice-versa.

We can always transform any undirected graph to a directed graph by separating each edge between u and v to two edges. Therefore, in this article, we’ll discuss directed graphs since they’re a more general case.

3. Graph Data Structures

Three main data structures are used to store graphs in memory.

3.1. Adjacency Matrix

The first data structure is called the adjacency matrix. As the name suggests, adjacency matrices are helpful when we need to quickly find whether two nodes are adjacent (connected) or not. The adjacency matrix is a 2D boolean array of a size V^2.

Let’s name it G, then we should have:

G_{uv} = true if there’s a direct edge from u to v.

G_{uv} = false otherwise.

However, in case the handled graph was weighted, then each cell G_{uv} will be a 2D array that contains the weight of the direct edge between u and v.

If there is no edge between u and v, then G_{uv} will contain a special value indicating there is no direct connection between u and v. Usually, we can use a large value, indicating that moving directly between u and v costs a lot, or is impossible.

3.2. Adjacency List

The second data structure is the adjacency list. In this data structure, we don’t aim to store the value of all different pairs u and v. Instead, for each node, we store the neighboring nodes only.

To do this, we create an array G of size V. Each cell u will hold a linked list. Each object inside the linked list will store the index of node v that is connected to the node with index u. Therefore, each cell u will have a linked list of size N_u, where N_u corresponds to the number of nodes connected to node u.

In other words:

G_{ui}; i \in [1, N_u] such that G_{ui} equals to the ith neighbor of node u.

In case we’re dealing with weighted graphs, then each object inside the linked list will hold two pieces of information, the neighboring node v, and the cost of the edge between u and v.

3.3. Edges List

The last data structure is the edges list. From the name, we can infer that we store all the edges from our graph inside a linked list.

Let’s call this list as G. Each object inside the linked list will hold two things, node u and node v, indicating that an edge exists that connects node u with node v. From the above we can infer that:

G_{ui}; i \in [1, E] such that G_{ui} contains the information of the ith edge inside the graph.

If the graph is weighted then each object will hold a piece of third information, which is the weight of the edge between nodes u and v.

This data structure is especially helpful with graphs that have a large number of nodes, but only a small number of edges.

4. Complexity

Let’s look at the table below that shows an overview of the complexities of each graph storage data structure. Next, we’ll explain the reason behind each complexity:

Rendered by QuickLaTeX.com

4.1. Space Complexity

  • Adjacency Matrix: Since in adjacency matrices we store an array of size V^2, it means that the space complexity is O(V^2), where V is the number of vertices inside the graph.
  • Adjacency List: First, we store an array of size V, where each cell stores the information of one of our graph’s nodes. This means that first, we need a space complexity of O(V) to store an empty array. Next, we move to the sum of all linked lists’ sizes. Since we only create an extra linked list object in case of a new edge, it means that the sum of linked lists’ sizes is equal to O(E), where E is the number of edges in our graph. This leads us to a total space complexity of O(V+E).
  • Edges List: In this data structure we only have a linked list that stores all the possible edges between all nodes inside the graph. Therefore, the memory complexity is O(E).

4.2. Connection Checking Complexity

  • Adjacency Matrix: Checking whether two nodes u and v are connected or not is pretty efficient when using adjacency matrices. All we have to do is to look for the value of the cell G_{uv}. Therefore, the time complexity equals O(1).
  • Adjacency List: To find whether two nodes u and v are connected or not, we have to iterate over the linked list stored inside G_u. In the worst case, there won’t be an edge between u and v, and we’ll end up making N_u iterations. Hence, the total complexity is O(N_u), where N_u is the number of neighbors of u.
  • Edges List: In the edges list case, we don’t have any other option but to iterate over all the edges inside the linked list, and check whether we can find the edge needed between nodes u and v. The complexity is O(E) where E is the number of edges inside the graph.

4.3. Neighbors Finding Complexity

  • Adjacency Matrix: To find all the neighboring nodes of some node u, we have to iterate over all the cells in the row u to determine which nodes have a direct edge connecting it to u. In other words, we need to check all cells G{ui}, where i \in [1, V]. Therefore, the time complexity is O(V).
  • Adjacency List: Finding all the neighboring nodes quickly is what adjacency list was created for. Since cell G_u stores a linked list that contains all the nodes v connected to u, we just need to iterate over the linked list stored inside G_u. The time complexity of such iterating is O(N_u), where N_u represents the number of the neighbors of node u.
  • Edges List: Edges list is probably not the best solution to finding all neighboring nodes quickly. We need to iterate over all the stored objects inside the linked list and check if the stored nodes are u and v. Hence, the time complexity is O(E), where E is the number of edges in the graph.

5. Advantages and Disadvantages

Adjacency matrices are helpful when we need to quickly check if two nodes have a direct edge or not. However, the main disadvantage is its large memory complexity. The adjacency matrix is most helpful in cases where the graph doesn’t contain a large number of nodes. Also, when the graph is almost complete (every node is connected to almost all the other nodes), using adjacency matrices might be a good solution.

Adjacency lists, on the other hand, are a great option when we need to continuously access all the neighbors of some node u. In that case, we’ll only be iterating over the needed nodes. Adjacency list limitations show when we need to check if two nodes have a direct edge or not.

However, it’s worth noting that we can use an updated version of adjacency lists. Instead of storing all the neighboring nodes in a linked list, we can store them in a more complex data structure, like a set for example. This would allow us to iterate over the neighboring nodes efficiently. Also, we can check if two nodes are connected in logarithmic time complexity.

Edges lists are the least used data structure. Mainly, we use edges lists when we have an enormous amount of nodes that can’t be stored inside the memory, with only a few edges. In that case, we wouldn’t have any other option but to use the edges list. So, the only advantage of the edges list is its low memory space complexity.

6. Conclusion

In this article, we presented the three main data structures to store a graph in memory.

Next, we discussed the space and time complexities of the main operations that most graph algorithms perform.

Finally, we discussed the advantages and disadvantages of each data structure in terms of space and time complexity, and when to use each data structure.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments