Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

Checking if String complies with business rules is crucial for most applications. Often, we need to check if the name contains only allowed characters, if the email is in the correct format, or if there are restrictions on the password.

In this tutorial, we’ll learn how to check if a String is alphanumeric, which can be helpful in many cases.

2. Alphanumeric Characters

First, let’s identify the term explicitly to avoid any confusion. Alphanumeric characters are a combination of letters and numbers. More specifically, Latin letters and Arabic digits. Thus, we will not consider any special characters or underscores as part of Alphanumeric characters.

3. Checking Approaches

In general, we have two main approaches to this problem. The first uses a regex pattern, and the second checks all the characters individually.

3.1. Using Regex

This is the simplest approach, which requires us to provide the correct regex pattern. In our case, we’ll be using this one:

String REGEX = "^[a-zA-Z0-9]*$";

Technically, we could use \w shortcut to identify “word characters,” but unfortunately, it doesn’t comply with our requirement as this pattern might induce an underscore and can be expressed like this: [a-zA-Z0-9_].

After identifying a correct pattern, the next step is to check a given String against it. It can be done directly on the String itself:

boolean result = TEST_STRING.matches(REGEX);

However, it’s not the best way, especially if we need to do such checks regularly. The String would recompile regex on every invocation of the match(String) method. Thus, it’s better to use a static Pattern:

Pattern PATTERN = Pattern.compile(REGEX);
Matcher matcher = PATTERN.matcher(TEST_STRING);
boolean result = matcher.matches();

Overall, it’s a straightforward, flexible approach that makes the code simple and understandable.

3.2. Checking Characters One-by-One

Another approach is to check each character in the String. We can use any approach to iterate over a given String. For demonstration purposes, let’s go with a simple for loop:

boolean result = true;
for (int i = 0; i < TEST_STRING.length(); ++i) {
    int codePoint = TEST_STRING.codePointAt(i);
    if (!isAlphanumeric(codePoint)) {
        result = false;
        break;
    }
}

We can implement isAlphanumeric(int) in several ways, but overall, we must match the character code in the ASCII table. We’ll be using an ASCII table because we outline the initial constraints of using Latin letters and Arabic digits:

boolean isAlphanumeric(final int codePoint) {
    return (codePoint >= 65 && codePoint <= 90) ||
           (codePoint >= 97 && codePoint <= 122) ||
           (codePoint >= 48 && codePoint <= 57);
}

Additionally, we can use Character.isAlphabetic(int) and Character.isDigit(int). These methods are highly optimized and may boost the performance of the application:

boolean result = true;
for (int i = 0; i < TEST_STRING.length(); ++i) {
    final int codePoint = TEST_STRING.codePointAt(i);
    if (!Character.isAlphabetic(codePoint) || !Character.isDigit(codePoint)) {
        result = false;
        break;
    }
}

This approach requires more code and also is highly imperative. At the same time, it provides us with the benefits of transparent implementation. However, different implementations might unintentionally worsen the space complexity of this approach:

boolean result = true;
for (final char c : TEST_STRING.toCharArray()) {
    if (!isAlphanumeric(c)) {
        result = false;
        break;
    }
}

The toCharArray() method would create a separate array to contain the characters from the String, degrading the space complexity from O(1) to O(n). We can say the same with the Stream API approach:

boolean result = TEST_STRING.chars().allMatch(this::isAlphanumeric);

Please pay attention to these pitfalls, especially if the performance is crucial to the application.

4. Pros and Cons

From the previous examples, it’s clear that the first approach is simpler to write and read, while the second one requires more code and potentially could contain more bugs. However, let’s compare them from the performance perspective with JMH. The tests are set to run only for a minute as it’s enough to compare their throughput.

We get the following results. The score shows the number of operations in seconds. Thus, a higher score identified a more performant solution:

Benchmark                                                                   Mode  Cnt           Score   Error  Units
AlphanumericPerformanceBenchmark.alphanumericIteration                     thrpt        165036629.641          ops/s
AlphanumericPerformanceBenchmark.alphanumericIterationWithCharacterChecks  thrpt       2350726870.739          ops/s
AlphanumericPerformanceBenchmark.alphanumericIterationWithCopy             thrpt        129884251.890          ops/s
AlphanumericPerformanceBenchmark.alphanumericIterationWithStream           thrpt         40552684.681          ops/s
AlphanumericPerformanceBenchmark.alphanumericRegex                         thrpt         23739293.608          ops/s
AlphanumericPerformanceBenchmark.alphanumericRegexDirectlyOnString         thrpt         10536565.422          ops/s

As we can see, we have the readability-performance tradeoff. More readable and more declarative solutions tend to be less performant. At the same time, please note that unnecessary optimization may do more harm than good. Thus, for most applications, regex is a good and clean solution that can be easily extended.

However, the iterative approach would perform better if an application relies on a high volume of texts matched to specific rules. Which ultimately reduces CPU usage and downtimes and increases the throughput.

5. Conclusion

There are a couple of ways to check if a String is alphanumeric. Both have pros and cons, which should be carefully considered. The choice can be reduced to the extensibility versus performance.

Optimize the code when there’s a real need for performance, as optimized code is often less readable and more prone to hard-to-debug bugs.

As always, the code is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.