In this tutorial, we’ll talk about skip lists. They are a popular probabilistic alternative to balanced trees. We’ll explain how the skip lists work and show how to implement three standard operations on them: insertion, search, and deletion.
Let’s imagine a sorted linked list with numbers:
If we search for a number in it, we’ll traverse all elements in the worst case (when the sought value is greater than the list’s maximum). We start from the first node and follow all the pointers to the last one.
Now, let’s add a pointer two nodes ahead to every other node:
Starting from the first node, we can skip two nodes at a time, so we won’t visit more than nodes in the worst case before we either find the number we seek or conclude it’s not in the list:
What would happen if we gave every fourth node a pointer four nodes ahead of it? Then, the number of nodes we visit in the worst case would be approximately . In general, if every -th node has a pointer to the node nodes ahead of it, we reduce the number of nodes we have to examine in the worst-case search scenario to .
That’s the main idea behind skip lists. They contain nodes with more than one forward pointer, thus allowing for fast lookup. A node with forward pointers is called a level- node. Half the nodes are level 1, of the nodes are level 2, are level 3, nodes are level , and so on.
3. Skip Lists as Probabilistic Structures
However, deletion and insertion would be impractical in the case of the above skip list. After deleting or inserting an element, we’ll have to update the pointers to keep pointing to the new nodes at appropriate positions.
A way to avoid the overhead is to keep the changes local. We can do that if we don’t fix the positions of the higher-level nodes. For example, we can decide the level of a new node randomly so that higher-level nodes aren’t necessarily () nodes from one another. Instead, a level- pointer will point to the next level- node wherever it is in the list. That way, insertions, and deletions won’t lead to rearranging the pointers other than that pointed to the node immediately after the inserted one. The analogous reasoning holds for deletion.
To keep the logarithmic complexity of the search, we can maintain the same ratio of the levels’ sizes from the above example. We do that by setting to the probability to advance a level-() node to the level while inserting it into the list. In that case, we’ll have approximately nodes at level .
Therefore, a skip list is a multilevel linked list with its nodes’ levels decided randomly.
3.1. How to Set the Levels
If we don’t limit the levels in advance, they’ll depend only on the chosen chance to advance a node during insertion. We don’t have to use as in the above example. Any can serve as the probability, and the time complexity of search will still be .
Here’s the pseudocode for selecting a new node’s level:
3.2. The Expected Level
A node cannot be at the level without first becoming a level- node. So, the probability of a node being at level is (we subtract 1 because we don’t toss the dice for deciding the first level). From there, we see that a node’s level minus 1 follows a geometric distribution with as the “success probability”. So, the expected value of a node’s level is:
3.3. The Number of Pointers
There are pointers in a single-linked list with nodes. One of them points to the head, while the rest form the links between the nodes. Since all the nodes’ levels are , we know there will be level-1 pointers.
At the second level, we expect to have pointers, the same as the expected number of level-2 nodes. The same applies to the higher levels. So, the expected number of pointers is (if is ):
With the maximal level set to , the number of pointers is bounded from above by . Since is independent of , the expected memory complexity of a skip list is , same as the space complexity of a linked list. Moreover, we see that we won’t ever use more than twice the pointers a single-linked list has (since ).
3.4. The Infrastructure
We’ll consider a node the data structure with the following attributes:
- : the value it contains
- : the array of pointers to the successors at various levels
Similarly, we’ll assume that the list is a data structure with two attributes:
- – the current maximal level of the list
- – the array of pointers to the heads at the levels
4. How to Insert Into a Skip List
To insert a node with value into a skip list, we have to do three things:
- find where to place it
- choose its level
- and update the pointer that should point to the new node
We’ve already seen how to choose the level of the new node . To find where to place it, we start from the head node at the top level and follow the pointers until we reach two consecutive nodes, and , such that . If such nodes exist and isn’t greater than the current level (i.e., the top-level), we should update to point to and set the new node to point to . Then, we repeat the steps until we reach the bottom layer.
If the new node’s level is higher than the current maximal level of the entire list, we should insert new levels. We do that by setting the new node as their head.
Here’s the pseudocode of insertion:
Let’s say that we want to insert the number into this skip list:
We follow these pointers to find where to place it:
Let’s also say that we chose to set it up as a level-2 node. This is what the list looks like after insertion (the new node and new pointers are in red):
The complexity of insertion, as well as deletion, is dominated by the complexity of the search. To see why, let’s observe that before inserting a new element, we need to locate where to put it. The same goes for deleting a node. We have to find it before deleting it.
5. How to Search Skip Lists
Let’s see how we search a skip list for some value . It’s similar to inserting into the list, but we don’t have to update any pointers. We can start at the top level and follow the pointers until we find or a value greater than . In the former case, we return the corresponding node. In the latter, we step one level down and repeat the steps. The search ends when we find or cascade to the bottom-level node greater than . The node right before it must contain if the value is in the list. Otherwise, we return .
Here’s the pseudocode of search:
Here are the steps we make while searching for 17 in the above list:
5.3. The Expected Complexity
We’ll express the expected complexity of the search as the expected length of a search path. It starts in the top-left node (whose level is ) and ends in the goal node ( levels below ).
It’s easier to derive the expected length by analyzing the path in reverse. We’ll split it into two parts:
- the path to the top-level
- the path connecting the top left node (the highest-level head, ) and the first top node we got to in reverse
Let be the expected length of the first part. Since skip lists are non-deterministic, we take the expectation over their possible structures. So, from a node levels below the top, we can either go back one level up with probability or move left with probability :
Since (no need to move to the top level if we’re already at it), we get:
We’ve already solved the second part when we analyzed the number of nodes at different levels. So, we expect the top level to contain nodes. Therefore, the total expected complexity of the search is:
5.4. How to Get the Logarithmic Expected Complexity of Skip Lists
The idea is to choose in such a way that becomes a constant . So, we solve the corresponding equation for :
So, the first part in Equation (2) becomes , and the second part reduces to . Therefore, the total expected complexity will be logarithmic if we set to a logarithm of .
In this analysis, we assumed that was the list’s max level and that we always start the search from the top-level head.
5.5. The Worst-Case Complexity of Skip Lists
In the worst case, the skip list degenerates to a single-linked list, and we’re looking for a value greater than the maximum. So, the worst-case complexity is because we traverse the whole list.
Even though it’s a pessimistic result, skip lists still pay off. The reasoning is similar to the case of Quicksort with random pivot selection. The theoretically worst possible performance may be poor, but it occurs rarely. In practice, skip lists are very efficient. The logarithmic expected complexity shows it.
6. How to Delete Elements From Skip Lists
Deletion is similar to insertion and search. We first find the node containing the value we want to delete. In the process, we memorize which nodes point to it. If we find the node, we unlink it from its predecessors and point them to the node’s successors at the appropriate levels:
In this article, we presented skip lists. We showed three algorithms for skip lists: insertion, search, and deletion. Also, we analyzed the lists as a non-deterministic data structure and showed how to get the logarithmic expected complexity of the search.
Since the complexity of search dominates that of insertion and deletion, the latter two operations can also be made logarithmic. Even though the worst-case complexity is linear, using skip lists still pays off as the worst-case scenario isn’t likely in practice.