Java Top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll discuss several techniques in Java on how to remove repeated characters from a string.

For each technique, we’ll also talk briefly about its time and space complexity.

2. Using distinct

Let’s start by removing the duplicates from our string using the distinct method introduced in Java 8.

Below, we’re obtaining an instance of an IntStream from a given string object. Then, we’re using the distinct method to remove the duplicates. Finally, we’re calling the forEach method to loop over the distinct characters and append them to our StringBuilder:

StringBuilder sb = new StringBuilder();
str.chars().distinct().forEach(c -> sb.append((char) c));

Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string

Auxiliary Space: O(n) – since distinct uses a LinkedHashSet internally and we’re also storing the resulting string in a StringBuilder object

Maintains Order: Yes – since the LinkedHashSet maintains the order of its elements

And, while it’s nice that Java 8 does this task for us so nicely, let’s compare it to efforts to roll our own.

3. Using indexOf

The naive approach to removing duplicates from a string simply involves looping over the input and using the indexOf method to check whether the current character already exists in the resulting string:

StringBuilder sb = new StringBuilder();
int idx;
for (int i = 0; i < str.length(); i++) {
    char c = str.charAt(i);
    idx = str.indexOf(c, i + 1);
    if (idx == -1) {
        sb.append(c);
    }
}

Time Complexity: O(n * n) – for each character, the indexOf method runs through the remaining string

Auxiliary Space: O(n) – linear space is required since we’re using the StringBuilder to store the result

Maintains Order: Yes

This method has the same space complexity as the first approach but performs much slower.

4. Using a Character Array

We can also remove duplicates from our string by converting it into a char array and then looping over each character and comparing it to all subsequent characters.

As we can see below, we’re creating two for loops and we’re checking whether each element is repeated in the string. If a duplicate is found, we don’t append it to the StringBuilder:

char[] chars = str.toCharArray();
StringBuilder sb = new StringBuilder();
boolean repeatedChar;
for (int i = 0; i < chars.length; i++) {
    repeatedChar = false;
    for (int j = i + 1; j < chars.length; j++) {
        if (chars[i] == chars[j]) {
            repeatedChar = true;
            break;
        }
    }
    if (!repeatedChar) {
        sb.append(chars[i]);
    }
}

Time Complexity: O(n * n) – we have an inner and an outer loop both traversing the input string

Auxiliary Space: O(n) – linear space is required since the chars variable stores a new copy of the string input and we’re also using the StringBuilder to save the result

Maintains Order: Yes

Again, our second attempt performs poorly compared to the Core Java offering, but let’s see where we get with our next attempt.

5. Using Sorting

Alternatively, repeated characters can be eliminated by sorting our input string to group duplicates.  In order to do that, we have to convert the string to a char array and sort it using the Arrays.sort method. Finally, we’ll iterate over the sorted char array.

During every iteration, we’re going to compare each element of the array with the previous element. If the elements are different then we’ll append the current character to the StringBuilder:

StringBuilder sb = new StringBuilder();
if(!str.isEmpty()) {
    char[] chars = str.toCharArray();
    Arrays.sort(chars);

    sb.append(chars[0]);
    for (int i = 1; i < chars.length; i++) {
        if (chars[i] != chars[i - 1]) {
            sb.append(chars[i]);
        }
    }
}

Time Complexity: O(n log n) – the sort uses a dual-pivot Quicksort which offers O(n log n) performance on many data sets

Auxiliary Space: O(n) – since the toCharArray method makes a copy of the input String

Maintains Order: No

Let’s try that again with our final attempt.

6. Using a Set

Another way to remove repeated characters from a string is through the use of a Set. If we do not care about the order of characters in our output string we can use a HashSet. Otherwise, we can use a LinkedHashSet to maintain the insertion order.

In both cases, we’ll loop over the input string and add each character to the Set. Once the characters are inserted into the set, we’ll iterate over it to add them to the StringBuilder and return the resulting string:

StringBuilder sb = new StringBuilder();
Set<Character> linkedHashSet = new LinkedHashSet<>();

for (int i = 0; i < str.length(); i++) {
    linkedHashSet.add(str.charAt(i));
}

for (Character c : linkedHashSet) {
    sb.append(c);
}

Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string

Auxiliary Space: O(n) – space required for the Set depends on the size of the input string; also, we’re using the StringBuilder to store the result

Maintains Order: LinkedHashSet – Yes, HashSet – No

And now, we’ve matched the Core Java approach! It’s not very shocking to find out that this is very similar to what distinct already does.

7. Conclusion

In this article, we covered a few ways to remove repeated characters from a string in Java. We also looked at the time and space complexity of each of these methods.

As always, code snippets can be found over on GitHub.

Java bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE