
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: June 29, 2024
One of the most important things to understand in graph theory is how to store them in memory.
In this tutorial, we’ll explain and compare three main data structures for graphs and show their advantages and disadvantages.
In graph theory, we refer to nodes as vertices and connections between nodes as edges
. When dealing with graph storage data structures, the comparison is done based on space and time complexities.
With graph storage data structures, we usually pay attention to the following complexities:
We call two different nodes “neighboring nodes” if there’s an edge that connects the first node with the second.
One thing that needs to be understood is that graphs are usually defined based on two factors.
The first factor is whether the graph is weighted or not. In graph theory, we sometimes care only about the fact that two nodes are connected. Other times, we also care about the cost of moving from node to node
.
In this article, we’ll show the update that needs to be done in both cases for each data structure.
The second factor is whether the graph is directed or not. If there’s an edge from to
, and we can only move from node
to node
, then the graph is called directed. However, in undirected graphs, an edge between nodes
and
means that we can move from node
to node
and vice-versa.
We can always transform any undirected graph to a directed graph by separating each edge between and
to two edges. Therefore, in this article, we’ll discuss directed graphs since they’re a more general case.
Three main data structures are used to store graphs in memory.
The first data structure is called the adjacency matrix. As the name suggests, adjacency matrices are helpful when we need to quickly find whether two nodes are adjacent (connected) or not. The adjacency matrix is a boolean array of a size
.
Let’s name it , then we should have:
if there’s a direct edge from
to
.
otherwise.
However, in case the handled graph was weighted, then each cell will be a
array that contains the weight of the direct edge between
and
.
If there is no edge between and
, then
will contain a special value indicating there is no direct connection between
and
. Usually, we can use a large value, indicating that moving directly between u and v costs a lot, or is impossible.
The second data structure is the adjacency list. In this data structure, we don’t aim to store the value of all different pairs and
. Instead, for each node, we store the neighboring nodes only.
To do this, we create an array of size
. Each cell
will hold a linked list. Each object inside the linked list will store the index of node
that is connected to the node with index
. Therefore, each cell
will have a linked list of size
, where
corresponds to the number of nodes connected to node
.
In other words:
such that
equals to the ith neighbor of node
.
In case we’re dealing with weighted graphs, then each object inside the linked list will hold two pieces of information, the neighboring node , and the cost of the edge between
and
.
The last data structure is the edges list. From the name, we can infer that we store all the edges from our graph inside a linked list.
Let’s call this list as . Each object inside the linked list will hold two things, node
and node
, indicating that an edge exists that connects node
with node
. From the above we can infer that:
such that
contains the information of the ith edge inside the graph.
If the graph is weighted then each object will hold a piece of third information, which is the weight of the edge between nodes and
.
This data structure is especially helpful with graphs that have a large number of nodes, but only a small number of edges.
Let’s look at the table below that shows an overview of the complexities of each graph storage data structure. Next, we’ll explain the reason behind each complexity:
Data Structure | Space Complexity | Time Complexity | |
---|---|---|---|
Connection Checking | Neighbors Finding | ||
Adjacency Matrix | |||
Adjacency List | |||
Edges List |
Adjacency matrices are helpful when we need to quickly check if two nodes have a direct edge or not. However, the main disadvantage is its large memory complexity. The adjacency matrix is most helpful in cases where the graph doesn’t contain a large number of nodes. Also, when the graph is almost complete (every node is connected to almost all the other nodes), using adjacency matrices might be a good solution.
Adjacency lists, on the other hand, are a great option when we need to continuously access all the neighbors of some node u. In that case, we’ll only be iterating over the needed nodes. Adjacency list limitations show when we need to check if two nodes have a direct edge or not.
However, it’s worth noting that we can use an updated version of adjacency lists. Instead of storing all the neighboring nodes in a linked list, we can store them in a more complex data structure, like a set for example. This would allow us to iterate over the neighboring nodes efficiently. Also, we can check if two nodes are connected in logarithmic time complexity.
Edges lists are the least used data structure. Mainly, we use edges lists when we have an enormous amount of nodes that can’t be stored inside the memory, with only a few edges. In that case, we wouldn’t have any other option but to use the edges list. So, the only advantage of the edges list is its low memory space complexity.
In this article, we presented the three main data structures to store a graph in memory.
Next, we discussed the space and time complexities of the main operations that most graph algorithms perform.
Finally, we discussed the advantages and disadvantages of each data structure in terms of space and time complexity, and when to use each data structure.