Let's suppose we need to remove all non-numeric characters from a String that contains alphanumeric and special characters while leaving the decimal separator in place. For instance, we want to extract the numeric and decimal part of the text from “The price of this bag is 100.5$” to get just “100.5”, which is the price part.
In this tutorial, we'll explore four distinct approaches for doing so in Java.
2. Using Regular Expression and String‘s replaceAll() Method
The easiest way is to use the built-in replaceAll() method of the String class. It replaces each portion of this text that matches the provided regular expression with the specified replacement.
The replaceAll() method takes two arguments: the regular expression and the replacement.
Therefore, if we pass a relevant regex and an empty string as the replacement parameter to the method, we can achieve our purpose.
For the sake of simplicity, we'll define a unit test to verify the expected result:
String s = "Testing abc123.555abc"; s = s.replaceAll("[^\\d.]", ""); assertEquals("123.555", s);
In the above test case, we've defined the regex as [^\\d.] to represent a negated set that matches any character that's not in the set containing any digit character (0-9) and the “.” character.
The above test successfully executes and thus verifies that the final result only comprises the numeric characters and a decimal separator.
3. Using Java 8 Stream
Using Java 8 Streams, we have the power to define a series of operations on data in different small steps:
String s = "Testing abc123.555abc"; StringBuilder sb = new StringBuilder(); s.chars() .mapToObj(c -> (char) c) .filter(c -> Character.isDigit(c) || c == '.') .forEach(sb::append); assertEquals("123.555", sb.toString());
Firstly, we created a StringBuilder instance to hold the final outcome. Then, we iterated over the individual characters in the String using the chars() method, which returns the stream of int, which are essentially the character codes. To deal with this situation, we used a mapping function mapToObj() that returns a Stream of Character.
Finally, we used the filter() method to select only those characters that are either a digit or a decimal point.
4. Using External Libraries
We can also solve our problem by integrating some external libraries like Guava and Apache Commons into our code base. We can utilize pre-defined utility classes that are available in these libraries.
To remove all non-numeric characters but keep the decimal separator in a Java String using Guava, we'll use methods from the CharMatcher utility class.
To include Guava, we first need to update our pom.xml file:
<dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>31.1-jre</version> </dependency>
Next, let's rewrite the unit test using methods from the CharMatcher class:
String s = "Testing abc123.555abc"; String result = CharMatcher.inRange('0', '9') .or(CharMatcher.is('.')) .retainFrom(s); assertEquals("123.555", result);
If we run the test, it executes successfully and returns the expected outcome. To make it clear, let's go over the methods we've used:
- The inRange() method takes two char arguments, startInclusive and endInclusive, and matches characters defined in the given range.
- The or() method takes a single parameter of the CharMatcher type. It returns a matcher by matching any character by either this matcher or the one it's called from.
- The is() method takes a single parameter, char match. It matches only one specified character.
- The retainFrom() method takes a single parameter, CharSequence sequence. It returns characters from the sequence of characters that satisfy the specified match criterion.
4.2. Apache Commons
To include Apache Commons Lang, we need to update our pom.xml file:
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.12.0</version> </dependency>
If we look at the RegExUtils class, we'll see that its removeAll() method can help us solve our problem:
String s = "Testing abc123.555abc"; String result = RegExUtils.removeAll(s, "[^\\d.]"); assertEquals("123.555", result);
RegExUtils.removeAll() requires two String parameters, text and regex. Here, we've defined regex in the same way as in the String.replaceAll example above.
In this article, we explored four different approaches for removing all non-numeric characters from a Java String while keeping the decimal separator.
As usual, all code snippets presented here can be found over on GitHub.