Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll learn how to check if a string has non-alphanumeric characters. This functionality is crucial in various scenarios such as finding the strength of a password, rejecting special characters entered in an application, and many more. The requirement becomes even more interesting when we want to restrict its usage to a language script, which we have also tried to address here.

2. Using Regular Expressions

We think using regular expression is the most flexible way of implementing this requirement. Let’s consider a simple use case, where the application must accept only English digits and alphabet characters. To achieve this, we use regex [^a-zA-Z0-9] to identify a non-alphanumeric character:

public class NonAlphaNumRegexChecker {
    private static final Pattern PATTERN_NON_ALPHNUM_USASCII = Pattern.compile("[^a-zA-Z0-9]+");
    
    public static boolean isAlphanumeric(String str) {
        Matcher matcher = PATTERN_NON_ALPHNUM_USASCII.matcher(str);
        return matcher.find();
    }
}

But if the application wants to accept letters from other languages, then we must tweak the regular expression so that it covers Unicode alphabet characters and digits, too. For more details, please check out the section “Unicode Support” in the Javadocs. Here, we’re using the regex binary property classes, IsAlphabetic and IsDigit:

public class NonAlphaNumRegexChecker {
    private static final Pattern PATTERN_NON_ALPHNUM_ANYLANG = Pattern.compile("[^\\p{IsAlphabetic}\\p{IsDigit}]");
    
    public static boolean containsNonAlphanumeric(String input) {
        Matcher matcher = PATTERN_NON_ALPHNUM_ANYLANG.matcher(input);
        return matcher.find();
    }
}

Let’s consider another use case where the application accepts only the character from a particular Unicode script such as Cyrillic, Georgian, or Greek. To implement such a case, regular expression supports Unicode script classes such as IsCyrillic, IsGreek, IsGeorgian, and others. Let’s see an example:

public class NonAlphaNumRegexChecker {
    public static boolean containsNonAlphanumeric(String input, String script) {
        String regexScriptClass = "\\p{" + "Is" + script + "}";
        Pattern pattern = Pattern.compile("[^" + regexScriptClass + "\\p{IsDigit}]"); //Binary properties
        Matcher matcher = pattern.matcher(input);
        return matcher.find();
    }
}

Since the above method takes language script as a parameter, it has to compile the pattern every time. This could be a performance bottleneck, and hence, we can cache the compiled Pattern object for all the scripts mentioned in the enum Character.UnicodeScript in a map and retrieve it with the key script.

3. Using the isLetterOrDigit() Method of the Class Character

Now, let’s look at the Character class, which can help implement all the use cases discussed in the last section. The first solution checks for non-alphanumeric characters in a string written in any language by using the method isLetterOrDigit():

public class NonAlphaNumericChecker {
    public static boolean isNonAlphanumericAnyLangScript(String str) {
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (!Character.isLetterOrDigit(c)) {
                return true;
            }
        }
        return false;
    }
}

But, if we want to allow only a particular language script, then we have to tweak it a little bit. Here, we are considering a character to be non-alphanumeric when it is neither a letter in that language nor a digit:

public class NonAlphaNumericChecker {
    public static boolean isNonAlphanumericInLangScript(String str, String script) {
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (!Character.UnicodeScript.of(c).toString().equalsIgnoreCase(script)
              && !Character.isDigit(c)) {
                return true;
            }
        }
        return false;
    }
}

4. Using the StringUtils Class from Apache Commons Lang Library

Here comes the least flexible of all the techniques used so far. The method isAlphanumeric() in StringUtils supports all the Unicode letters or digits, but there’s no support for identifying the language script used in the string. Let’s see it in action:

public static boolean isNonAlphanumericAnyLangScriptV2(String str) {
    return !StringUtils.isAlphanumeric(str);
}

5. Conclusion

In this tutorial, we discussed a few use cases where we must check for the presence of non-alphanumeric characters in a string. We conclude that the regex technique is the most flexible of all available options. The code snippets used here, along with associated JUnit test cases, are available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.