Expand Authors Top

If you have a few years of experience in the Java ecosystem and you’d like to share that with the community, have a look at our Contribution Guidelines.

November Discount Launch 2022 – Top
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:

>> GET ACCESS NOW

November Discount Launch 2022 – TEMP TOP (NPI)
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:

>> GET ACCESS NOW

Expanded Audience – Frontegg – Security (partner)
announcement - icon User management is very complex, when implemented properly. No surprise here.

Not having to roll all of that out manually, but instead integrating a mature, fully-fledged solution - yeah, that makes a lot of sense.
That's basically what Frontegg is - User Management for your application. It's focused on making your app scalable, secure and enjoyable for your users.
From signup to authentication, it supports simple scenarios all the way to complex and custom application logic.

Have a look:

>> Elegant User Management, Tailor-made for B2B SaaS

1. Introduction

Programmers often come across algorithms involving splitting strings. In a special scenario, there might be a requirement to split a string based on single or multiple distinct delimiters and also return the delimiters as part of the split operation.

Let's discuss in detail the different available solutions to this String split problem.

2. Fundamentals

The Java universe offers quite a few libraries (java.lang.String, Guava, and Apache Commons, to name a few) to facilitate the splitting of strings in simple and fairly complex cases. Additionally, the feature-rich regular expressions provide extra flexibility in splitting problems that revolve around the matching of a specific pattern.

3. Look-Around Assertions

In regular expressions, look-around assertions indicate that a match is possible either by looking ahead (lookahead) or looking behind (lookbehind) for another pattern, at the current location of the source string. Let's understand this better with an example.

A lookahead assertion Java(?=Baeldung) matches “Java” only if it is followed by “Baeldung”.

Likewise, a negative lookbehind assertion (?<!#)\d+ matches a number only if it is not preceded by ‘#'.

Let's use such look-around assertion regular expressions and devise a solution to our problem.

In all of the examples explained in this article, we're going to use two simple Strings:

String text = "[email protected]@[email protected]@[email protected]@Program";
String textMixed = "@[email protected]:[email protected]#Java#Program";

4. Using String.split()

Let's begin by using the split() method from the String class of the core Java library.

Moreover, we'll evaluate appropriate lookahead assertions, lookbehind assertions, and combinations of them to split the strings as desired.

4.1. Positive Lookahead

First of all, let's use the lookahead assertion “(([email protected]))” and split the string text around its matches:

String[] splits = text.split("(([email protected]))");

The lookahead regex splits the string by a forward match of the “@” symbol. The content of the resulting array is:

[Hello, @World, @This, @Is, @A, @Java, @Program]

Using this regex doesn't return the delimiters separately in the splits array. Let's try an alternate approach.

4.2. Positive Lookbehind

We can also use a positive lookbehind assertion “((?<[email protected]))” to split the string text:

String[] splits = text.split("((?<[email protected]))");

However, the resulting output still won't contain the delimiters as individual elements of the array:

[[email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Program]

4.3. Positive Lookahead or Lookbehind

We can use the combination of the above two explained look-arounds with a logical-or and see it in action.

The resulting regex “(([email protected])|(?<[email protected]))” will definitely give us the desired results. The below code snippet demonstrates this:

String[] splits = text.split("(([email protected])|(?<[email protected]))");

The above regular expression splits the string, and the resulting array contains the delimiters:

[Hello, @, World, @, This, @, Is, @, A, @, Java, @, Program]

Now that we understand the required look-around assertion regular expression, we can modify it based on the different types of delimiters present in the input string.

Let's attempt to split the textMixed as defined previously using a suitable regex:

String[] splitsMixed = textMixed.split("((?=:|#|@)|(?<=:|#|@))");

It would not be surprising to see the below results after executing the above line of code:

[@, HelloWorld, @, This, :, Is, @, A, #, Java, #, Program]

5. Using Guava Splitter

Considering that now we have clarity on the regex assertions discussed in the above section, let's delve into a Java library offered by Google.

The Splitter class from Guava offers methods on() and onPattern() to split a string using a regular expression pattern as a separator.

To start with, let's see them in action on the string text containing a single delimiter “@”:

List<String> splits = Splitter.onPattern("(([email protected])|(?<[email protected]))").splitToList(text);
List<String> splits2 = Splitter.on(Pattern.compile("(([email protected])|(?<[email protected]))")).splitToList(text);

The results from executing the above lines of code are quite similar to the ones generated by the split method, except we now have Lists instead of arrays.

Likewise, we can also use these methods to split a string containing multiple distinct delimiters:

List<String> splitsMixed = Splitter.onPattern("((?=:|#|@)|(?<=:|#|@))").splitToList(textMixed);
List<String> splitsMixed2 = Splitter.on(Pattern.compile("((?=:|#|@)|(?<=:|#|@))")).splitToList(textMixed);

As we can see, the difference between the above two methods is quite noticeable.

The on() method accepts an argument of java.util.regex.Pattern, whereas the onPattern() method just accepts the separator regex as a String.

6. Using Apache Commons StringUtils

We can also take advantage of the Apache Commons Lang project's StringUtils method splitByCharacterType().

It's really important to note that this method works by splitting the input string by the character type as returned by java.lang.Character.getType(char). Here, we don't get to pick or extract the delimiters of our choosing.

Furthermore, it delivers the best results when the source string has a constant case, either upper or lower, throughout:

String[] splits = StringUtils.splitByCharacterType("[email protected];[email protected];[email protected];[email protected]#10words;Java#Program");

The different character types as seen in the above string are uppercase and lowercase letters, digits, and special characters (@ ; # ).

Hence, the resulting array splits, as expected, looks like:

[pg, @, no, ;, 10, @, hello, ;, world, @, this, ;, is, @, a, #, 10, words, ;, J, ava, #, P, rogram]

7. Conclusion

In this article, we've seen how to split a string in such a way that the delimiters are also available in the resulting array.

First, we discussed look-around assertions and used them to get the desired results. Later, we used the methods provided by the Guava library to achieve similar results.

Finally, we wrapped up with the Apache Commons Lang library, which provides a more user-friendly method to solve a related problem of splitting a string, also returning the delimiters.

As always, the code used in this article can be found over on GitHub.

November Discount Launch 2022 – Bottom
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:

>> GET ACCESS NOW

Generic footer banner
Comments are closed on this article!