Course – LS (cat=Java)

Get started with Spring 5 and Spring Boot 2, through the Learn Spring course:

> CHECK OUT THE COURSE

1. Overview

In Java, a String can be seen as a concatenation of multiple substrings. Moreover, it's common to use whitespace as a delimiter for building and storing a collection of substrings into a single string.

In this tutorial, we'll learn how to split a String by whitespace characters, such as space, tab, or newline.

2. String Samples

First, we need to build a few String samples that we can use as input for splitting by whitespace(s). So, let's start by defining some of the whitespace characters as String constants so that we can reuse them conveniently:

String SPACE = " ";
String TAB = "	";
String NEW_LINE = "\n";

Next, let's use these as delimiters for defining String samples containing names of different fruits:

String FRUITS_TAB_SEPARATED = "Apple" + TAB + "Banana" + TAB + "Mango" + TAB + "Orange";
String FRUITS_SPACE_SEPARATED = "Apple" + SPACE + "Banana" + SPACE + "Mango" + SPACE + "Orange";
String FRUITS_NEWLINE_SEPARATED = "Apple" + NEW_LINE + "Banana" + NEW_LINE + "Mango" + NEW_LINE + "Orange";

Finally, let's also write the verifySplit() method that we'll reuse for verifying the expected result of splitting these strings by whitespace characters:

private void verifySplit(String[] fruitArray) {
    assertEquals(4, fruitArray.length);
    assertEquals("Apple", fruitArray[0]);
    assertEquals("Banana", fruitArray[1]);
    assertEquals("Mango", fruitArray[2]);
    assertEquals("Orange", fruitArray[3]);
}

Now that we've built the input strings, we're ready to explore different strategies to split these and verify the splits.

3. Split Using Delimiter Regex

The split() method of the String class is the de facto standard for splitting strings. It accepts a delimiter regex and produces the splits into an array of Strings:

String[] split(String regex);

First, let's split the FRUITS_SPACE_SEPARATED String by a single space character:

@Test
public void givenSpaceSeparatedString_whenSplitUsingSpace_shouldGetExpectedResult() {
    String fruits = FRUITS_SPACE_SEPARATED;
    String[] fruitArray = fruits.split(SPACE);
    verifySplit(fruitArray);
}

Similarly, we can split the FRUITS_TAB_SEPARATED and FRUITS_NEWLINE_SEPARATED by using TAB and NEW_LINE, respectively, as the delimiter regex.

Next, let's try to use a more generic regex for space, tab, and newline characters and split all the string samples with the same regex:

@Test
public void givenWhiteSpaceSeparatedString_whenSplitUsingWhiteSpaceRegex_shouldGetExpectedResult() {
    String whitespaceRegex = SPACE + "|" + TAB + "|" + NEW_LINE;
    String[] allSamples = new String[] { FRUITS_SPACE_SEPARATED, FRUITS_TAB_SEPARATED, FRUITS_NEWLINE_SEPARATED };
    for (String fruits : allSamples) {
        String[] fruitArray = fruits.split(whitespaceRegex);
        verifySplit(fruitArray);
    }
}

So far, it looks like we've got this right!

Finally, let's simplify our approach by using the whitespace meta character (\s) that represents all kinds of whitespace characters by itself:

@Test
public void givenNewlineSeparatedString_whenSplitUsingWhiteSpaceMetaChar_shouldGetExpectedResult() {
    String whitespaceMetaChar = "\\s";
    String[] allSamples = new String[] { FRUITS_SPACE_SEPARATED, FRUITS_TAB_SEPARATED, FRUITS_NEWLINE_SEPARATED };
    for (String fruits : allSamples) {
        String[] fruitArray = fruits.split(whitespaceMetaChar);
        verifySplit(fruitArray);
    }
}

We should note that it's more convenient and reliable to use the \s meta character over creating our custom regex for whitespace. Further, if our input string can have more than one whitespace character as a delimiter, then we can use \\s+ over \\s without changing the rest of the code.

4. Split Using StringTokenizer

Splitting a string by whitespace is such a common use case that many Java libraries expose an interface to achieve this without specifying the delimiter explicitly. In this section, let's learn how we can use the StringTokenizer to solve this use case:

@Test
public void givenSpaceSeparatedString_whenSplitUsingStringTokenizer_shouldGetExpectedResult() {
    String fruits = FRUITS_SPACE_SEPARATED;
    StringTokenizer tokenizer = new StringTokenizer(fruits);
    String[] fruitArray = new String[tokenizer.countTokens()];
    int index = 0;
    while (tokenizer.hasMoreTokens()) {
        fruitArray[index++] = tokenizer.nextToken();
    }
    verifySplit(fruitArray);
}

We can see that we didn't provide any delimiter, as StringTokenizer uses a whitespace delimiter by default. Also, the code follows the iterator design pattern wherein the hasMoreTokens() method decides the loop termination condition, and nextToken() gives the next split.

Further, we should note that we used the countTokens() method to predetermine the number of splits. However, that's not required if we want to consume the resultant splits one at a time in a sequence. In general, we should use this approach when the input string is long and we want to get the immediate next split without waiting for the entire split process to finish.

5. Split Using Apache Commons

The StringUtils class of the org.apache.commons.lang3 package provides a split() utility method for splitting a String. Like the StringTokenizer class, it uses whitespace as the default delimiter for splitting a string:

public static String[] split(String str);

Let's start by adding the commons-lang3 dependency in the project's pom.xml file:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
</dependency>

Next, let's see this in action by splitting the String samples:

@Test
public void givenWhiteSpaceSeparatedString_whenSplitUsingStringUtils_shouldGetExpectedResult() {
    String[] allSamples = new String[] { FRUITS_SPACE_SEPARATED, FRUITS_TAB_SEPARATED, FRUITS_NEWLINE_SEPARATED };
    for (String fruits : allSamples) {
        String[] fruitArray = StringUtils.split(fruits);
        verifySplit(fruitArray);
    }
}

One of the advantages of using the split() utility method of the StringUtils class is that the caller doesn't have to perform the null checks explicitly. That's because the split() method handles this gracefully. Let's continue and see this in action:

@Test
public void givenNullString_whenSplitUsingStringUtils_shouldReturnNull() {
    String fruits = null;
    String[] fruitArray = StringUtils.split(fruits);
    assertNull(fruitArray);
}

As expected, the method returns a null value for null input.

6. Conclusion

In this tutorial, we learned multiple approaches for splitting strings by whitespace. Further, we also took note of the advantages and recommended best practices associated with some of the strategies.

As always, the complete source code for the tutorial is available over on GitHub.

Course – LS (cat=Java)

Get started with Spring 5 and Spring Boot 2, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are closed on this article!