One of the most important things to understand in graph theory is how to store them in memory.
In this tutorial, we’ll explain and compare three main data structures for graphs and show their advantages and disadvantages.
2. Defining the Problem
In graph theory, we refer to nodes as vertices and connections between nodes as edges . When dealing with graph storage data structures, the comparison is done based on space and time complexities.
With graph storage data structures, we usually pay attention to the following complexities:
- Space Complexity: the approximate amount of memory needed to store a graph in the chosen data structure
- Time Complexity
- Connection Checking Complexity: the approximate amount of time needed to find whether two different nodes are neighbors or not
- Neighbors Finding Complexity: the approximate amount of time needed to find all the neighboring nodes of some goal node
We call two different nodes “neighboring nodes” if there’s an edge that connects the first node with the second.
2.2. Types of Graphs
One thing that needs to be understood is that graphs are usually defined based on two factors.
The first factor is whether the graph is weighted or not. In graph theory, we sometimes care only about the fact that two nodes are connected. Other times, we also care about the cost of moving from node to node .
In this article, we’ll show the update that needs to be done in both cases for each data structure.
The second factor is whether the graph is directed or not. If there’s an edge from to , and we can only move from node to node , then the graph is called directed. However, in undirected graphs, an edge between nodes and means that we can move from node to node and vice-versa.
We can always transform any undirected graph to a directed graph by separating each edge between and to two edges. Therefore, in this article, we’ll discuss directed graphs since they’re a more general case.
3. Graph Data Structures
Three main data structures are used to store graphs in memory.
3.1. Adjacency Matrix
The first data structure is called the adjacency matrix. As the name suggests, adjacency matrices are helpful when we need to quickly find whether two nodes are adjacent (connected) or not. The adjacency matrix is a boolean array of a size .
Let’s name it , then we should have:
if there’s a direct edge from to .
However, in case the handled graph was weighted, then each cell will be a array that contains the weight of the direct edge between and .
If there is no edge between and , then will contain a special value indicating there is no direct connection between and . Usually, we can use a large value, indicating that moving directly between u and v costs a lot, or is impossible.
3.2. Adjacency List
The second data structure is the adjacency list. In this data structure, we don’t aim to store the value of all different pairs and . Instead, for each node, we store the neighboring nodes only.
To do this, we create an array of size . Each cell will hold a linked list. Each object inside the linked list will store the index of node that is connected to the node with index . Therefore, each cell will have a linked list of size , where corresponds to the number of nodes connected to node .
In other words:
such that equals to the ith neighbor of node .
In case we’re dealing with weighted graphs, then each object inside the linked list will hold two pieces of information, the neighboring node , and the cost of the edge between and .
3.3. Edges List
The last data structure is the edges list. From the name, we can infer that we store all the edges from our graph inside a linked list.
Let’s call this list as . Each object inside the linked list will hold two things, node and node , indicating that an edge exists that connects node with node . From the above we can infer that:
such that contains the information of the ith edge inside the graph.
If the graph is weighted then each object will hold a piece of third information, which is the weight of the edge between nodes and .
This data structure is especially helpful with graphs that have a large number of nodes, but only a small number of edges.
Let’s look at the table below that shows an overview of the complexities of each graph storage data structure. Next, we’ll explain the reason behind each complexity:
4.1. Space Complexity
- Adjacency Matrix: Since in adjacency matrices we store an array of size , it means that the space complexity is , where is the number of vertices inside the graph.
- Adjacency List: First, we store an array of size , where each cell stores the information of one of our graph’s nodes. This means that first, we need a space complexity of to store an empty array. Next, we move to the sum of all linked lists’ sizes. Since we only create an extra linked list object in case of a new edge, it means that the sum of linked lists’ sizes is equal to , where is the number of edges in our graph. This leads us to a total space complexity of .
- Edges List: In this data structure we only have a linked list that stores all the possible edges between all nodes inside the graph. Therefore, the memory complexity is .
4.2. Connection Checking Complexity
- Adjacency Matrix: Checking whether two nodes and are connected or not is pretty efficient when using adjacency matrices. All we have to do is to look for the value of the cell . Therefore, the time complexity equals .
- Adjacency List: To find whether two nodes and are connected or not, we have to iterate over the linked list stored inside . In the worst case, there won’t be an edge between and , and we’ll end up making iterations. Hence, the total complexity is , where is the number of neighbors of .
- Edges List: In the edges list case, we don’t have any other option but to iterate over all the edges inside the linked list, and check whether we can find the edge needed between nodes and . The complexity is where is the number of edges inside the graph.
4.3. Neighbors Finding Complexity
- Adjacency Matrix: To find all the neighboring nodes of some node , we have to iterate over all the cells in the row u to determine which nodes have a direct edge connecting it to . In other words, we need to check all cells , where . Therefore, the time complexity is .
- Adjacency List: Finding all the neighboring nodes quickly is what adjacency list was created for. Since cell stores a linked list that contains all the nodes connected to , we just need to iterate over the linked list stored inside . The time complexity of such iterating is , where represents the number of the neighbors of node .
- Edges List: Edges list is probably not the best solution to finding all neighboring nodes quickly. We need to iterate over all the stored objects inside the linked list and check if the stored nodes are and . Hence, the time complexity is , where is the number of edges in the graph.
5. Advantages and Disadvantages
Adjacency matrices are helpful when we need to quickly check if two nodes have a direct edge or not. However, the main disadvantage is its large memory complexity. The adjacency matrix is most helpful in cases where the graph doesn’t contain a large number of nodes. Also, when the graph is almost complete (every node is connected to almost all the other nodes), using adjacency matrices might be a good solution.
Adjacency lists, on the other hand, are a great option when we need to continuously access all the neighbors of some node u. In that case, we’ll only be iterating over the needed nodes. Adjacency list limitations show when we need to check if two nodes have a direct edge or not.
However, it’s worth noting that we can use an updated version of adjacency lists. Instead of storing all the neighboring nodes in a linked list, we can store them in a more complex data structure, like a set for example. This would allow us to iterate over the neighboring nodes efficiently. Also, we can check if two nodes are connected in logarithmic time complexity.
Edges lists are the least used data structure. Mainly, we use edges lists when we have an enormous amount of nodes that can’t be stored inside the memory, with only a few edges. In that case, we wouldn’t have any other option but to use the edges list. So, the only advantage of the edges list is its low memory space complexity.
In this article, we presented the three main data structures to store a graph in memory.
Next, we discussed the space and time complexities of the main operations that most graph algorithms perform.
Finally, we discussed the advantages and disadvantages of each data structure in terms of space and time complexity, and when to use each data structure.