Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll explore different ways to check if a string contains only Unicode letters.

Unicode is a character encoding standard that represents most of the world’s written languages. In Java, it’s important to ensure that a string contains only Unicode characters to maintain data integrity and avoid unexpected behavior.

2. Character Class

Java’s Character class provides a set of static methods that can be used to check various properties of characters. To determine if a string contains only Unicode letters, we can iterate through each character in the string and verify it using the Character.isLetter() method:

public class UnicodeLetterChecker {
    public boolean characterClassCheck(String input) {
        for (char c : input.toCharArray()) {
            if (!Character.isLetter(c)) {
                return false;
            }
        }
        return true;
    }
}

This approach checks each character one by one and returns false as soon as a non-letter character is encountered:

@Test
public void givenString_whenUsingIsLetter_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.characterClassCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

3. Regular Expressions

Java provides powerful regular expression support for string manipulation. We can use the matches() method from the String class along with a regular expression pattern to verify if a string consists solely of Unicode letters:

public class UnicodeLetterChecker {
    public boolean regexCheck(String input) {
        Pattern pattern = Pattern.compile("^\\p{L}+$");
        Matcher matcher = pattern.matcher(input);
        return matcher.matches();
    }
}

In this example, the regular expression \\p{L}+ matches one or more Unicode letters. If the string contains only Unicode letters, the method will return true:

@Test
public void givenString_whenUsingRegex_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.regexCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

4. Apache Commons Lang Library

The Apache Commons Lang library provides a convenient method in the StringUtils class to check if a string contains only Unicode letters. We can take advantage of the StringUtils.isAlpha() method to check if a string contains only letters:

public class UnicodeLetterChecker {
    public boolean isAlphaCheck(String input) {
        return StringUtils.isAlpha(input);
    }
}

The above method provides a convenient way to check if a string contains only letters, including Unicode letters, without writing custom logic:

@Test
public void givenString_whenUsingIsAlpha_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.isAlphaCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

5. Java Streams

Java Streams provide a powerful and concise way to determine if a string contains only Unicode letters. This approach ensures the string exclusively consists of valid Unicode letters, making it a robust solution for character validation.

By working with the String’s codePoints() and utilizing the allMatch() method, we can efficiently check if each character in the input string is a letter and belongs to a recognized Unicode script:

public class UnicodeLetterChecker {
    public boolean StreamsCheck(String input){
        return input.codePoints().allMatch(Character::isLetter);
    }
}

The above example uses the codePoints() method to convert the String into a stream of Unicode code points and then uses the allMatch() method to ensure that all code points are letters:

@Test
public void givenString_whenUsingStreams_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.StreamsCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

6. Conclusion

In this article, we’ve explored various methods for determining if a string comprises solely Unicode letters.

Regular expressions offer a powerful and concise way, while the Character class provides fine-grained control. Libraries like Apache Commons Lang can simplify the process, and Java Streams offer a modern, functional approach. Depending on our specific use case, one of these methods should serve us well to validate strings for Unicode letters.

As always, the full source code is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.