In this tutorial, we’ll look at two very important data structures in computer science – the hash table and binary search tree (BST), specifically its self-balancing variants. We’re going to analyze the main differences between them and figure out which to use depending on the problem at hand.
2.1. Hash Table
A hash table (also called a hash map) is a data structure that is used to map keys to values in an unsorted way. A key here means a unique identifier that points to the values we are storing.
A simple example of the concept would be a phonebook. Imagine we have contact name strings as keys and their corresponding telephone numbers as the data we’re storing. However, computers can’t associate strings with data directly but rather work with numbers.
That is why every new entry into the hash table phonebook would need to go through a hashing function that will compute an index. This index is then used to point to a bucket inside an array of all our contacts data.
Later, when we need to call someone in the phonebook, we can easily retrieve their number by hashing their name and finding the corresponding value.
This concept is very efficient but also very reliant on the choice of hashing function. A perfect function will assign each key to a unique bucket, but in practice, hash table implementations use an imperfect hashing function, which might cause two different inputs to result in the same index.
In our example, this would mean that two people with different names will point to the same phone number. This is also called a collision and is typically accommodated using methods like open addressing or chaining.
2.2. Self-Balancing BST
A tree is called binary if each of its nodes has at most two children, referred to as the left child and right child.
A binary search tree is a binary tree whose internal nodes each store a key and each has two distinguished sub-trees, again commonly denoted left and right. Additionally, the key in each node is greater than all the keys in the node’s left subtree and less than those in its right subtree.
This particular structure allows storing data such as numbers in a neatly sorted way, but most operations on it take time directly proportional to the height of the tree. So to keep the height of the tree as small as possible, a tree rotation routine can be applied. If such a routine is present, the tree is called self-balancing. Typical variants include AVL trees, B-trees, Red-black trees, and others.
3.1. Speed Considerations
The speed of the hash table mainly depends on the choice of hashing function since collisions slow the access to the needed elements. Operations on a hash table such as insertion, deletion, and search are blazingly fast and take on average.
However, if we have a really bad hashing function that always points to the same bucket the worst we can get is . In this case, we iterate through all the elements like in an array or linked list:
Another situation when the hash table requires linear time is when the table grows bigger than the initially allocated size. Then it has to recreate itself into a bigger table and rehash every element inside it. This makes the hash map not very suitable when working with constantly growing live data.
The time complexity for the same insert, search and delete operations in a self-balancing Binary Search Tree is on average. This is significantly slower compared to the hash table when we consider only these three operations.
That said, if the application requires more complex tasks like finding the closest lower or greater elements to a number or doing range queries then the sorted structure of the BST will do the search much faster in comparison.
3.2. Memory Considerations
Binary Search Trees are generally memory-efficient since they do not reserve more memory than they need to. On the other hand, Hash tables can be a bit more demanding if we don’t know the exact number of elements we want to store.
Usually, the hash table will allocate an array and avoid collisions by hashing uniformly by the size of this array. So if we have a hash table with initially reserved memory for 100 elements but use just 20 and the remaining memory will just be wasted.
In this article, we reviewed hash tables and self-balancing binary search trees, looked into their individual specifics, and compared them in different scenarios.
The main advantage of the hash table over self-balancing binary search trees is the constant speed of access. It is partially efficient when the maximum number of entries is known so that the underlying bucket array can be allocated once and never be resized.
On the other hand, if we are working with dynamic data that requires a lot of resizing and complex queries it will be more efficient to exploit the sorted structure of the self-balancing BSTs.