Non-Capturing Regex Groups in Java

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Regression testing is an important step in the release process, to ensure that new code doesn't break the existing functionality. As the codebase evolves, we want to run these tests frequently to help catch any issues early on.

The best way to ensure these tests run frequently on an automated basis is, of course, to include them in the CI/CD pipeline. This way, the regression tests will execute automatically whenever we commit code to the repository.

In this tutorial, we'll see how to create regression tests using Selenium, and then include them in our pipeline using GitHub Actions:, to be run on the LambdaTest cloud grid:

>> How to Run Selenium Regression Tests With GitHub Actions

1. Overview

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we’ll explore how to use non-capturing groups in Java Regular Expressions.

2. Regular Expression Groups

Regular expression groups can be one of two types: capturing and non-capturing.

Capturing groups save the matched character sequence. Their values can be used as backreferences in the pattern and/or retrieved later in code.

Although they don’t save the matched character sequence, non-capturing groups can alter pattern matching modifiers within the group. Some non-capturing groups can even discard backtracking information after a successful sub-pattern match.

Let’s explore some examples of non-capturing groups in action.

3. Non-Capturing Groups

A non-capturing group is created with the operator “(?:X)“. The “X” is the pattern for the group:

Pattern.compile("[^:]+://(?:[.a-z]+/?)+")

This pattern has a single non-capturing group. It will match a value if it is URL-like. A full regular expression for a URL would be much more involved. We’re using a simple pattern to focus on non-capturing groups.

The pattern “[^:]:” matches the protocol — for example, “http://“. The non-capturing group “(?:[.a-z]+/?)” matches the domain name with an optional slash. Since the “+” operator matches one or more occurrences of this pattern, we’ll match the subsequent path segments as well. Let’s test this pattern on a URL:

Pattern simpleUrlPattern = Pattern.compile("[^:]+://(?:[.a-z]+/?)+");
Matcher urlMatcher
  = simpleUrlPattern.matcher("http://www.microsoft.com/some/other/url/path");
    
Assertions.assertThat(urlMatcher.matches()).isTrue();

Let’s see what happens when we try to retrieve the matched text:

Pattern simpleUrlPattern = Pattern.compile("[^:]+://(?:[.a-z]+/?)+");
Matcher urlMatcher = simpleUrlPattern.matcher("http://www.microsoft.com/");
    
Assertions.assertThat(urlMatcher.matches()).isTrue();
Assertions.assertThatThrownBy(() -> urlMatcher.group(1))
  .isInstanceOf(IndexOutOfBoundsException.class);

The regular expression is compiled into a java.util.Pattern object. Then, we create a java.util.Matcher to apply our Pattern to the provided value.

Next, we assert that the result of matches() returns true.

We used a non-capturing group to match the domain name in the URL. Since non-capturing groups do not save matched text, we cannot retrieve the matched text “www.microsoft.com/”. Attempting to retrieve the domain name will result in an IndexOutOfBoundsException.

3.1. Inline Modifiers

Regular expressions are case-sensitive. If we apply our pattern to a mixed-case URL, the match will fail:

Pattern simpleUrlPattern
  = Pattern.compile("[^:]+://(?:[.a-z]+/?)+");
Matcher urlMatcher
  = simpleUrlPattern.matcher("http://www.Microsoft.com/");
    
Assertions.assertThat(urlMatcher.matches()).isFalse();

In the case where we want to match uppercase letters as well, there are a few options we could try.

One option is to add the uppercase character range to the pattern:

Pattern.compile("[^:]+://(?:[.a-zA-Z]+/?)+")

Another option is to use modifier flags. So, we can compile the regular expression to be case-insensitive:

Pattern.compile("[^:]+://(?:[.a-z]+/?)+", Pattern.CASE_INSENSITIVE)

Non-capturing groups allow for a third option: We can change the modifier flags for just the group. Let’s add the case-insensitive modifier flag (“i“) to the group:

Pattern.compile("[^:]+://(?i:[.a-z]+/?)+");

Now that we’ve made the group case-insensitive, let’s apply this pattern to a mixed-case URL:

Pattern scopedCaseInsensitiveUrlPattern
  = Pattern.compile("[^:]+://(?i:[.a-z]+/?)+");
Matcher urlMatcher
  = scopedCaseInsensitiveUrlPattern.matcher("http://www.Microsoft.com/");
    
Assertions.assertThat(urlMatcher.matches()).isTrue();

When a pattern is compiled to be case-insensitive, we can turn it off by adding the “-” operator in front of the modifier. Let’s apply this pattern to another mixed-case URL:

Pattern scopedCaseSensitiveUrlPattern
  = Pattern.compile("[^:]+://(?-i:[.a-z]+/?)+/ending-path", Pattern.CASE_INSENSITIVE);
Matcher urlMatcher
  = scopedCaseSensitiveUrlPattern.matcher("http://www.Microsoft.com/ending-path");
  
Assertions.assertThat(urlMatcher.matches()).isFalse();

In this example, the final path segment “/ending-path” is case-insensitive. The “/ending-path” portion of the pattern will match uppercase and lowercase characters.

When we turned off the case-insensitive option within the group, the non-capturing group only supported lowercase characters. Therefore, the mixed-case domain name did not match.

4. Independent Non-Capturing Groups

Independent non-capturing groups are a type of regular expression group. These groups discard backtracking information after finding a successful match. When using this type of group, we need to be aware of when backtracking can occur. Otherwise, our patterns may not match the values we think they should.

Backtracking is a feature of Nondeterministic Finite Automaton (NFA) regular expression engines. When the engine fails to match text, the NFA engine can explore alternatives in the pattern. The engine will fail the match after exhausting all available alternatives. We only cover backtracking as it relates to independent non-capturing groups.

An independent non-capturing group is created with the operator “(?>X)” where X is the sub-pattern:

Pattern.compile("[^:]+://(?>[.a-z]+/?)+/ending-path");

We have added “/ending-path” as a constant path segment. Having this additional requirement forces a backtracking situation. The domain name and other path segments can match the slash character. To match “/ending-path”, the engine will need to backtrack. By backtracking, the engine can remove the slash from the group and apply it to the “/ending-path” portion of the pattern.

Let’s apply our independent non-capturing group pattern to a URL:

Pattern independentUrlPattern
  = Pattern.compile("[^:]+://(?>[.a-z]+/?)+/ending-path");
Matcher independentMatcher
  = independentUrlPattern.matcher("http://www.microsoft.com/ending-path");
    
Assertions.assertThat(independentMatcher.matches()).isFalse();

The group matches the domain name and the slash successfully. So, we leave the scope of the independent non-capturing group.

This pattern requires a slash to appear before “ending-path”. However, our independent non-capturing group has matched the slash.

The NFA engine should try backtracking. Since the slash is optional at the end of the group, the NFA engine would remove the slash from the group and try again. The independent non-capturing group has discarded the backtracking information. So, the NFA engine cannot backtrack.

4.1. Backtracking Inside the Group

Backtracking can occur within an independent non-capturing group. While the NFA engine is matching the group, the backtracking information has not been discarded. The backtracking information is not discarded until after the group matches successfully:

Pattern independentUrlPatternWithBacktracking
  = Pattern.compile("[^:]+://(?>(?:[.a-z]+/?)+/)ending-path");
Matcher independentMatcher
  = independentUrlPatternWithBacktracking.matcher("http://www.microsoft.com/ending-path");
    
Assertions.assertThat(independentMatcher.matches()).isTrue();

Now we have a non-capturing group within an independent non-capturing group. We still have a backtracking situation involving the slash in front of “ending-path”. However, we have enclosed the backtracking portion of the pattern inside of the independent non-capturing group. The backtracking will occur within the independent non-capturing group. Therefore the NFA engine has enough information to backtrack, and the pattern matches the provided URL.

5. Conclusion

We’ve shown that non-capturing groups are different from capturing groups. However, they function as a single unit like their capturing counterparts. We have also shown that non-capturing groups can enable or disable the modifiers for the group instead of the pattern as a whole.

Similarly, we’ve shown how independent non-capturing groups discard backtracking information. Without this information, the NFA engine cannot explore alternatives to make a successful match. However, backtracking can occur within the group.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.