An Overview of Regular Expressions Performance in Java

Last updated: January 8, 2024

Written by: baeldung

Reviewed by: Michal Aibin

Java+

Regex

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Accessibility testing is a crucial aspect to ensure that your application is usable for everyone and meets accessibility standards that are required in many countries.

By automating these tests, teams can quickly detect issues related to screen reader compatibility, keyboard navigation, color contrast, and other aspects that could pose a barrier to using the software effectively for people with disabilities.

Learn how to automate accessibility testing with Selenium and the LambdaTest cloud-based testing platform that lets developers and testers perform accessibility automation on over 3000+ real environments:

Automated Accessibility Testing With Selenium

1. Overview

In this quick tutorial, we’ll show how the pattern-matching engine works. We’ll also present different ways to optimize regular expressions in Java.

For an introduction to the use of regular expressions, please refer to this article here.

2. The Pattern-Matching Engine

The java.util.regex package uses a type of pattern-matching engine called a Nondeterministic Finite Automaton (NFA). It’s considered nondeterministic because while trying to match a regular expression on a given string, each character in the input might be checked several times against different parts of the regular expression.

In the background, the engine mentioned above uses backtracking. This general algorithm tries to exhaust all possibilities until it declares failure. Consider the following example to better understand the NFA:

"tra(vel|ce|de)m"

With the input String “travel“, the engine first will look for “tra” and find it immediately.

After that, it’ll try to match “vel” starting from the fourth character. This will match, so it will go forward and try to match “m“.

That won’t match, and for that reason, it’ll go back to the fourth character and search for “ce“. Again, this won’t match, so it’ll go back again to the fourth position and try with “de“. That string won’t match either, and so it’ll go back to the second character in the input string and try to search for another “tra“.

With the last failure, the algorithm will return failure.

With the simple last example, the engine had to backtrack several times while trying to match the input String to the regular expression. Because of that, it’s important to minimize the amount of backtracking that it does.

3. Ways to Optimize Regular Expressions

3.1. Avoid Recompilation

Regular expressions in Java are compiled into an internal data structure. This compilation is the time-consuming process.

Each time we invoke the String.matches(String regex) method, the specified regular expression is recompiled:

if (input.matches(regexPattern)) {
    // do something
}

As we can see, every time the condition is evaluated, the regex expression is compiled.

To optimize, it’s possible to compile the pattern first and then create a Matcher to find the coincidences in the value:

Pattern pattern = Pattern.compile(regexPattern);
for(String value : values) {
    Matcher matcher = pattern.matcher(value);
    if (matcher.matches()) {
        // do something
    }
}

An alternative to the above optimization is using the same Matcher instance with its reset() method:

Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher("");
for(String value : values) {
    matcher.reset(value);
    if (matcher.matches()) {
      // do something
    }
}

Due to the fact of Matcher isn’t thread safe, we have to be cautious with the use of this variation. It could be likely dangerous in multi-threaded scenarios.

To summarize, in every situation where we’re sure there’s only one user of the Matcher at any point in time, it’s OK to reuse it with reset. For the rest, reusing the precompiled it’s enough.

3.2. Working with Alternation

As we just checked in the last section, the inadequate use of alternations could be harmful to the performance. It’s important to place options that are more likely to happen in the front so they can be matched faster.

Also, we have to extract common patterns between them. It isn’t the same to put:

(travel | trade | trace)

Than:

tra(vel | de | ce)

The latter is faster because the NFA will try to match “tra” and won’t try any of the alternatives if it doesn’t find it.

3.3. Capturing Groups

Each time we’re capturing groups, we’re incurring in a small-time penalty.

If we don’t need to capture the text inside a group, we should consider the use of non-capturing groups. Instead of use “(M)“, please use “(?:M)“.

4. Conclusion

In this quick article, we briefly revisited how NFA works. We then proceeded to explore how to optimize the performance of our regular expressions by pre-compiling our patterns and reuse a Matcher.

Finally, we pointed out a couple of considerations to keep in mind while we work with alternations and groups.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.