In this tutorial, we’ll discuss the problem of removing duplicates from a linked list. This problem can have many versions. Thus, we’ll explain the general idea of the problem and then discuss the naive approach that solves it.
After that, we’ll improve the naive approach to get a faster solution. Also, we’ll present some of its other versions and describe how to solve them.
In the end, we’ll present a comparison between all the provided approaches.
2. Defining the Problem
Suppose we have a linked list , which has elements. We want to remove the duplicates from it to get a new linked list that has unique elements.
In other words, the resulting linked list mustn’t have any element repeated more than once.
3. Naive Approach
First of all, we’ll describe the naive approach.
Let’s take a look at its implementation:
In this approach, we’re checking all the possible values for each of the already added elements. Hence, we call this the brute force approach.
Firstly, we define as an empty list, and we’ll use it to store the resulting linked list.
Next, we use the iterator to iterate over all possible values in . For each value, we use the iterator to iterate over all the previously taken values. We’ll check whether there is any value equal to the current one.
If so, we don’t add the element to the answer. Otherwise, it means this is the first time we’ve seen this element. Therefore, we add it to the answer.
In the end, we return the linked list that doesn’t have any duplicate.
The complexity of the naive approach is , where is the size of the linked list. The reason is that, in the worst-case, all elements are different. Thus, we’ll have to iterate over all taken elements for each .
However, in the special case, when all elements are equal, the complexity is reduced to thanks to the break statement that stops the check loop after finding a matching element.
4. Faster Approach
Let’s try to improve our naive approach.
4.1. General Idea
In the naive approach, we iterated over all previously taken values to check whether there are any duplicates. However, let’s think about how to do this check faster.
In other words, we need a data structure to hold the taken elements. Next, we’ll use this data structure to check whether we have duplicates or not.
For example, the TreeSet data structure is the right choice because we can add and search for elements in it and check if it contains an element in time.
Take a look at the implementation of the Faster approach:
Similarly to the naive approach, we define as an empty list that will hold the final answer. Also, we define as a new empty TreeSet that will hold the added elements to check for duplicates later.
Same as before, we’ll use the iterator to iterate over all possible values in . However, in this approach, we’re using the data structure to check whether there are any previously added values that equal to the current element.
If we don’t find the element inside the data structure, then we should add it to the list. Also, we need to add it to the so that it won’t be added if presented later.
Finally, we return the linked list that doesn’t have any duplicate.
The complexity of the faster approach is , where is the size of the linked list.
The reason is that, in the worst-case, when all elements are different, we’ll need to add all the elements to . Also, we’d need to perform a check for all values inside the list .
However, in the special case, when all elements are equal, the complexity is reduced to . This is because the complexity of the is affected by the number of different elements added to it. In this case, it’d be just one element.
5. Small Integer Values Approach
In this special case, we have a linked list named that contains elements. Each element, has an integer value in the range .
In a particular case, when is relatively small, we can solve this problem using the following approach:
- Define a new boolean array with size . Let’s call it and initialize all values in this array with .
- If we want to check whether value is added before, we need only to check if is .
- When we add the value to , we have to assign to .
Take a look at how the implementation looks like:
Firstly, let’s define as an empty list that will hold the resulting answer. Also, let’s define as the array used to check the existence of a certain element. The initial value of this array should be .
Secondly, we’ll iterate over all elements in . For each element, we’ll check whether we added the current value before. To do this, we use the array. If we didn’t add this value before, we should add it and assign to its location in .
Finally, we return the linked list.
Note that, in case we have negative values, such that the element’s range is , we can still use this approach after shifting all values by . Hence, the array’s range will and the position of element will be .
The complexity of this approach is , where is the size of the linked list, and is the maximum value of the elements.
6. Sorted Linked List Approach
Suppose the linked list was sorted. Let’s take advantage of the sort to get an even faster solution.
6.1. General Idea
When the linked list is sorted, all equal elements will be adjacent to each other. So, when we’re checking for duplicates, we can suffice by checking the last added element to the list .
Let’s take a look at the implementation of the sorted linked list approach:
At first, we defined as an empty list, similar to before. Then, we start iterating over all the elements and checking whether the current value was added before.
To do this, we compare the current value with the last element added to the list . Also, we pay attention to the case when we’re at the first element inside the list . In this case, the list will be empty.
As a result, we add the value to if we haven’t added any elements before, or if it’s not equal to the last added element.
The complexity of this approach is , where is the size of the linked list.
Take a look at the differences between previous approaches:
As we can see, the naive approach is simpler to implement and easier to understand. But, when considering the time complexity, the faster approach generally has a lower complexity.
However, in the special case when the list is sorted, it’s preferred to use the sorted linked list approach because it has a better complexity.
Also, in the case when we have small integer values inside the list, it’s better to use the small integers approach.
In this tutorial, we explained the problem of removing duplicates from a linked list.
Firstly, we presented the naive approach and improved it to obtain a faster approach. Then, we discussed two special cases and showed how to solve the problem in these cases.
Finally, we presented a summary comparison between all the presented approaches and showed when to use each one.