Remove Characters From a String That Are in the Other String

Last updated: July 6, 2024

Written by: Kai Yuan

Reviewed by: Eric Martin

Java String

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Regression testing is an important step in the release process, to ensure that new code doesn't break the existing functionality. As the codebase evolves, we want to run these tests frequently to help catch any issues early on.

The best way to ensure these tests run frequently on an automated basis is, of course, to include them in the CI/CD pipeline. This way, the regression tests will execute automatically whenever we commit code to the repository.

In this tutorial, we'll see how to create regression tests using Selenium, and then include them in our pipeline using GitHub Actions:, to be run on the LambdaTest cloud grid:

>> How to Run Selenium Regression Tests With GitHub Actions

1. Overview

When we work with Java, we often encounter tasks that require precision and a collaborative effort between elements. Removing characters from a string based on their presence in another string is one such problem.

In this tutorial, we’ll explore various techniques to achieve this task.

2. Introduction to the Problem

As usual, an example can help us understand the problem quickly. Let’s say we have two strings:

String STRING = "a b c d e f g h i j";
String OTHER = "bdfhj";

Our goal is to eliminate characters from the STRING string if they are present in the string OTHER. Thus, we expect to get this string as the result:

"a  c  e  g  i "

We’ll learn various approaches to solving this problem in this tutorial. Also, we’ll unit test these solutions to verify whether they produce the expected result.

3. Using Nested Loops

We know a string can be easily split into a char array using the standard toCharArray() method. So, a straightforward and classic approach is first converting the two strings to two char arrays. Then, for each character in STRING, we decide whether to remove it or not by checking if it’s present in OTHER.

We can use nested for loops to implement this logic:

String nestedLoopApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    for (char c : theString.toCharArray()) {
        boolean found = false;
        for (char o : other.toCharArray()) {
            if (c == o) {
                found = true;
                break;
            }
        }
        if (!found) {
            sb.append(c);
        }
    }
    return sb.toString();
}

It’s worth noting since Java strings are immutable objects, we use StringBuilder instead of the ‘+’ operator to concatenate strings to gain better performance.

Next, let’s create a test:

String result = nestedLoopApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

The test passes if we give it a run, so the method does the job.

Since for each character in STRING, we check through the string OTHER, the time complexity of this solution is O(n²).

4. Replacing the Inner Loop With the indexOf() Method

In the nested loops solution, we created the boolean flag found to store if the current character has been found in the OTHER String and then decided if we need to keep or discard this character by checking the found flag.

Java provides the String.indexOf() method that allows us to locate a given character in a string. Further, if the string doesn’t contain the given character, the method returns -1.

So, if we make use of the String.indexOf() method, the inner loop and the found flag aren’t required:

String loopAndIndexOfApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    for (char c : theString.toCharArray()) {
        if (other.indexOf(c) == -1) {
            sb.append(c);
        }
    }
    return sb.toString();
}

As we can see, this method’s code is easier to understand than the nested loops one, and it passes the test as well:

String result = loopAndIndexOfApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

Although this implementation is compact and easy to read, as the String.indexOf() method internally searches the target character through the string by a loop, its time complexity is still O(n²).

Next, let’s see if we can find a solution with lower time complexity.

5. Using a HashSet

HashSet is a commonly used collection data structure. It stores the elements in an internal HashMap.

Since the hash function’s time complexity is O(1), HashSet‘s contains() method is an O(1) operation.

Therefore, we can first store all characters in the OTHER string in a HashSet and then check each character from STRING in the HashSet:

String hashSetApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    Set<Character> set = new HashSet<>(other.length());
    for (char c : other.toCharArray()) {
        set.add(c);
    }

    for (char i : theString.toCharArray()) {
        if (set.contains(i)) {
            continue;
        }
        sb.append(i);
    }
    return sb.toString();
}

As the code above shows, the implementation is quite straightforward. Now, let’s delve into its performance.

Initially, we iterate through one string to populate the Set object, making it an O(n) operation. Subsequently, for each character in the other string, we utilize the set.contains() method. This results in n times O(1), becoming another O(n) complexity. Therefore, the entire solution comprises two O(n) operations.

However, since the factor of two is a constant, the overall time complexity of the solution remains O(n). This stands out as a significant improvement compared to previous O(n²) solutions, demonstrating a considerably faster execution.

Finally, if we test the hashSetApproach() method, it gives the expected result:

String result = hashSetApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

6. Conclusion

In this article, we explored three different approaches to removing characters from one string based on their presence in another.

Furthermore, we conducted a performance analysis, explicitly focusing on time complexity. The results revealed that both nested loops and loops utilizing indexOf() exhibit equivalent time complexities, while solutions employing HashSet to be the most efficient.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.