Remove Duplicates From a Linked List | Baeldung on Computer Science

1. Overview

In this tutorial, we’ll discuss the problem of removing duplicates from a linked list. This problem can have many versions. Thus, we’ll explain the general idea of the problem and then discuss the naive approach that solves it.

After that, we’ll improve the naive approach to get a faster solution. Also, we’ll present some of its other versions and describe how to solve them.

In the end, we’ll present a comparison between all the provided approaches.

2. Defining the Problem

Suppose we have a linked list $A$ , which has $n$ elements. We want to remove the duplicates from it to get a new linked list that has unique elements.

In other words, the resulting linked list mustn’t have any element repeated more than once.

3. Naive Approach

First of all, we’ll describe the naive approach.

Let’s take a look at its implementation:

algorithm removeDuplicatesNaive(A):
    // INPUT
    //   A = a linked list
    // OUTPUT
    //   Returns the linked list after removing duplicates

    answer <- an empty linked list
    iIter <- A.head

    while iIter != null:
        jIter <- answer.head
        isAdded <- false

        while jIter != null:
            if iIter.value = jIter.value:
                isAdded <- true
                break
            jIter <- jIter.next

        if not isAdded:
            answer.add(iIter.value)

        iIter <- iIter.next

    return answer

In this approach, we’re checking all the possible values for each of the already added elements. Hence, we call this the brute force approach.

Firstly, we define $answer$ as an empty list, and we’ll use it to store the resulting linked list.

Next, we use the iterator $iIter$ to iterate over all possible values in $A$ . For each value, we use the iterator $\boldsymbol{jIter}$ to iterate over all the previously taken values. We’ll check whether there is any value equal to the current one.

If so, we don’t add the element to the answer. Otherwise, it means this is the first time we’ve seen this element. Therefore, we add it to the answer.

In the end, we return the $answer$ linked list that doesn’t have any duplicate.

The complexity of the naive approach is $\boldsymbol{O(n^2)}$ , where $n$ is the size of the linked list. The reason is that, in the worst-case, all elements are different. Thus, we’ll have to iterate over all taken elements for each $i$ .

However, in the special case, when all elements are equal, the complexity is reduced to $O(n)$ thanks to the break statement that stops the check loop after finding a matching element.

4. Faster Approach

Let’s try to improve our naive approach.

4.1. General Idea

In the naive approach, we iterated over all previously taken values to check whether there are any duplicates. However, let’s think about how to do this check faster.

In other words, we need a data structure to hold the taken elements. Next, we’ll use this data structure to check whether we have duplicates or not.

For example, the TreeSet data structure is the right choice because we can add and search for elements in it and check if it contains an element in $O(log (n))$ time.

4.2. Implementation

Take a look at the implementation of the faster approach:

algorithm removeDuplicatesFaster(A):
    // INPUT
    //   A = a linked list
    // OUTPUT
    //   Returns the linked list after removing duplicates

    answer <- an empty linked list
    set <- an empty set
    iIter <- A.head

    while iIter != null:
        if not set.contains(iIter.value):
            answer.add(iIter.value)
            set.add(iIter.value)

        iIter <- iIter.next

    return answer

Similarly to the naive approach, we define $answer$ as an empty list that will hold the final answer. Also, we define $set$ as a new empty TreeSet that will hold the added elements to check for duplicates later.

Same as before, we’ll use the iterator $iIter$ to iterate over all possible values in $A$ . However, in this approach, we’re using the $\boldsymbol{set}$ data structure to check whether there are any previously added values that equal to the current element.

If we don’t find the element inside the $set$ data structure, then we should add it to the $answer$ list. Also, we need to add it to the $set$ so that it won’t be added if presented later.

Finally, we return the $answer$ linked list that doesn’t have any duplicate.

The complexity of the faster approach is $\boldsymbol{O(n \times log(n))}$ , where $n$ is the size of the linked list.

The reason is that, in the worst-case, when all elements are different, we’ll need to add all the elements to $set$ . Also, we’d need to perform a check for all values inside the list $A$ .

However, in the special case, when all elements are equal, the complexity is reduced to $O(n)$ . This is because the complexity of the $set$ is affected by the number of different elements added to it. In this case, it’d be just one element.

5. Small Integer Values Approach

In this special case, we have a linked list named $A$ that contains $n$ elements. Each element, has an integer value in the range $[0,MaxVal]$ .

In a particular case, when $MaxVal$ is relatively small, we can solve this problem using the following approach:

Define a new boolean array with size $MaxVal + 1$ . Let’s call it $isAdded[MaxVal + 1]$ and initialize all values in this array with $false$ .
If we want to check whether value $\boldsymbol{x}$ is added before, we need only to check if $\boldsymbol{isAdded[x]}$ is $\boldsymbol{true}$ .
When we add the value $\boldsymbol{x}$ to $\boldsymbol{answer}$ , we have to assign $\boldsymbol{true}$ to $\boldsymbol{isAdded[x]}$ .

Take a look at what the implementation looks like:

algorithm removeDuplicatesSmallIntegers(A):
    // INPUT
    //   A = a linked list
    // OUTPUT
    //   Returns the linked list after removing duplicates

    answer <- an empty linked list
    isAdded <- array of false (size based on possible integer values)

    iIter <- A.head

    while iIter != null:
        if not isAdded[iIter.value]:
            answer.add(iIter.value)
            isAdded[iIter.value] <- true

        iIter <- iIter.next

    return answer

Firstly, let’s define $answer$ as an empty list that will hold the resulting answer. Also, let’s define $isAdded$ as the array used to check the existence of a certain element. The initial value of this array should be $false$ .

Secondly, we’ll iterate over all elements in $A$ . For each element, we’ll check whether we added the current value before. To do this, we use the $isAdded$ array. If we didn’t add this value before, we should add it and assign $true$ to its location in $isAdded$ .

Finally, we return the $answer$ linked list.

Note that, in case we have negative values, such that the element’s range is $[-a,+b]$ , we can still use this approach after shifting all values by $a$ . Hence, the array’s range will $[0,a+b]$ and the position of element $x$ will be $x+a$ .

The complexity of this approach is $\boldsymbol{O(n+MaxVal)}$ , where $n$ is the size of the linked list, and $MaxVal$ is the maximum value of the elements.

6. Sorted Linked List Approach

Suppose the linked list $A$ was sorted. Let’s take advantage of the sort to get an even faster solution.

6.1. General Idea

When the linked list is sorted, all equal elements will be adjacent to each other. So, when we’re checking for duplicates, we can suffice by checking the last added element to the list $answer$ .

6.2. Implementation

Let’s take a look at the implementation of the sorted linked list approach:

algorithm removeDuplicatesSorted(A):
    // INPUT
    //   A = a sorted linked list
    // OUTPUT
    //   Returns the linked list after removing duplicates

    answer <- an empty linked list
    iIter <- A.head

    while iIter != null:
        if answer.isEmpty() or answer.getLast() != iIter.value:
            answer.add(iIter.value)

        iIter <- iIter.next

    return answer

At first, we defined $answer$ as an empty list, similar to before. Then, we start iterating over all the elements and checking whether the current value was added before.

To do this, we compare the current value with the last element added to the list $\boldsymbol{answer}$ . Also, we pay attention to the case when we’re at the first element inside the list $\boldsymbol{A}$ . In this case, the list $answer$ will be empty.

As a result, we add the value to $answer$ if we haven’t added any elements before, or if it’s not equal to the last added element.

The complexity of this approach is $\boldsymbol{O(n)}$ , where $n$ is the size of the linked list.

7. Comparison

Take a look at the differences between previous approaches:

	Naive Approach	Faster Approach	Small Integer Values Approach	Sorted Linked List Approach
Worst Time Complexity	$O(n^2)$	$O(n \times log(n))$	$O(n)$	$O(n)$
Best Time Complexity	$O(n)$	$O(n)$	$O(n)$	$O(n)$
Memory Complexity	$O(n)$	$O(n)$	$O(n+MaxVal)$	$O(n)$
Use Cases	General approach	General approach	Only when the values are relatively small	Only for sorted linked lists

As we can see, the naive approach is simpler to implement and easier to understand. But, when considering the time complexity, the faster approach generally has a lower complexity.

However, in the special case when the list is sorted, it’s preferred to use the sorted linked list approach because it has a better complexity.

Also, in the case when we have small integer values inside the list, it’s better to use the small integers approach.

8. Conclusion

In this tutorial, we explained the problem of removing duplicates from a linked list.

Firstly, we presented the naive approach and improved it to obtain a faster approach. Then, we discussed two special cases and showed how to solve the problem in these cases.

Finally, we presented a summary comparison between all the presented approaches and showed when to use each one.

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex

Full Archive

About Baeldung