Authors Top

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

1. Introduction

In this tutorial, we’ll look at two data structures: hash table and trie. We’ll define a problem that is solvable with a hash table or a trie data structure. Then, we’ll compare the two solutions and point out the similarities and differences between them.

2. Problem Description

Let’s start with the problem which can be solved by a hash table or a trie. Given a dictionary that contains a list of strings, \{s_1, s_2, ..., s_N\} and a string t, we want to check whether or not t is in the dictionary.

3. Hash Table Solution

We can use a hash table to construct the dictionary. A hash table (also called a hash map) is a data structure that is used to map keys to values in an unsorted way. In our problem, we can treat each string in the dictionary as a key to the hash table.

3.1. Hash Table Construction

When we put a string into a hash table, we need a hash function to compute the string’s hash value. A hash function can take a variable-length input string and produce a fixed-length output value.

Normally, the output of a hash function is an integer index that points to the location where we store the string in the hash table. For example, we can use the following polynomial rolling hash function to convert a string into an integer:

Rendered by QuickLaTeX.com

where p and m are some positive numbers.

Since a hash function needs to consider all characters of the input string, the time complexity of the hash function is O(n), where n is the length of the input string. Therefore, the time to construct the dictionary is linear to the total number of characters of all the strings.

3.2. Hash Table Lookup

When we check whether a string t is in the hash table, we can use the same hash function to compute the hash value of t. Then, we look up the hash table to see whether it contains the computed hash value.

The hash value computation takes O(n) time, where n is the length of the input string t. Normally, the lookup process takes constant time if we have a good hash function. Therefore, the overall lookup time complexity is O(n).

4. Trie Solution

We can also use a trie data structure to construct the dictionary. A trie or a prefix tree is a particular kind of search tree, where nodes are usually keyed by strings.

In a trie, a link between two nodes represents a character in the keyed string. For example, the following trie represents a dictionary of strings \{to, tea, ten, in, inn\}:

Rendered by QuickLaTeX.com

If a node represents a whole string in the dictionary, we mark it as a complete node.

4.1. Trie Construction

When we insert a string into a trie, we start from the root node and search for a link that corresponds to the first character. If the link already exists, we move down the tree following the link to the next child level and continue searching for the next character. If the link doesn’t exist, we create a new node and link it with the parent’s link matching the current key character.

We repeat this construction step for each character of the input string until we finish all the characters. Then, we mark the last node as a complete node.

For example, to add a new word, “teach“, into the above example trie, we first follow the “tea” path from the root node. Then, we create additional nodes and links for the characters c and h. Finally, we mark the last node as the complete node:

Rendered by QuickLaTeX.com

In the trie construction, we don’t need a hash function to compute the key. However, we still need to go through each character of the input string. Therefore, the time complexity to add a string into the trie is also O(n) where n is the length of the input string.

4.2. Trie Lookup

We can use a similar way to check whether a string is in the trie. We can start from the root and follow the character links based on the input string’s content. If we can find a matching path for all characters and the last node is a complete node, then the input string exists in the trie. Otherwise, the trie doesn’t contain the input string.

Since we traverse each character of the input string only once, the overall lookup time complexity is also O(n).

5. Comparisons

The two approaches are not the same. Let’s take some aspects where we can compare the two.

5.1. Lookup Speed

When we look up a string for a hash table, we first calculate the hash value of the string, which takes O(n) time. Then, it will take O(1) time to locate the hash value in the memory, assuming we have a good hash function. Therefore, the overall lookup time complexity is O(n).

When we look up a string for a trie, we go through each character of the string and locate its corresponding node in the trie. The overall lookup time complexity is also O(n).

However, the trie has some more overhead to retrieve the whole string. We need to access the memory multiple times to locate the trie nodes along the character path. For the hash table, we only need to compute the hash value for the input string once. Therefore, it is relatively faster when we look up a whole string in the hash table.

For a hash table, we always compute the hash value for the whole input string whether the string exists in the hash table or not. For a trie, we can stop the search early when we don’t find a matching character link. Therefore, the lookup speed could be faster for the trie if the input string doesn’t exist in the trie.

5.2. Memory Requirement

When we first construct a hash table, we normally pre-allocate a big chunk of memory to avoid collisions by hashing uniformly on the size of the memory. In the future when we insert a string into the hash table, we only need to store the string content.

For a trie data structure, we need to store extra data such as character link pointers and complete node flags. Therefore, the trie requires more memory to store the string data. However, if there are many common prefixes, the memory requirement becomes smaller as we can share the prefix nodes.

Overall, the memory requirement between a hash table and a trie is based on the size of pre-allocated hash table memory and input dictionary strings.

5.3. Applications

Although the hash table has a relatively faster lookup speed, it only supports the exact match of the whole string. The trie solution is more flexible to support more applications, such as auto-complete. Also, we can easily print all the words in the dictionary in alphabetic order with a trie.

Therefore, if we want a full-text lookup application, the hash table is better as it has a faster lookup speed. However, if we want to return all the words that match a prefix, the trie is our solution.

6. Conclusion

In this article, we discussed two data structures: hash table and trie. We compared the time complexity and memory requirement for these two data structures. Also, we discussed the applications where they are suitable to use.

Authors Bottom

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!