Remove Punctuation From a String in Java

Last updated: August 22, 2023

Written by: Kai Yuan

Reviewed by: Eric Martin

Java String

Regex

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Accessibility testing is a crucial aspect to ensure that your application is usable for everyone and meets accessibility standards that are required in many countries.

By automating these tests, teams can quickly detect issues related to screen reader compatibility, keyboard navigation, color contrast, and other aspects that could pose a barrier to using the software effectively for people with disabilities.

Learn how to automate accessibility testing with Selenium and the LambdaTest cloud-based testing platform that lets developers and testers perform accessibility automation on over 3000+ real environments:

Automated Accessibility Testing With Selenium

1. Overview

It’s a common practice in text processing and analysis to eliminate punctuation from a string.

In this quick tutorial, let’s explore how to easily remove punctuation from a given string.

2. Introduction to the Problem

Let’s say we have a string:

static final String INPUT = "It's 1 W o r d (!@#$%^&*{}[];':\")<>,.";

As we can see, the string INPUT contains digits, letters, whitespace, and various punctuation marks.

Our goal is to remove punctuation marks from the string only and leave letters, digits, and whitespace in the result:

static final String EXPECTED = "Its 1 W o r d ";

In this tutorial, we’ll mainly use the String.replaceAll() method, which is shipped with the Java standard library, to solve the problem.

For simplicity, we’ll use unit test assertions to verify whether the result is as expected.

So next, let’s see how the punctuation marks get removed.

3. Using the Regex Pattern “[^\sa-zA-Z0-9]” and “\p{Punct}“

We’ve mentioned using the String.replaceAll() method to remove punctuation from the input string. The replaceAll() method does regex-based string substitution. It checks through the input string and replaces all parts that match ourrRegex pattern with a replacement string.

Therefore, the regex pattern is the key to solving this problem.

As we want to leave letters, digits, and whitespace in the result, we can replace any character that’s not a digit, a letter, or a whitespace character with an empty string. We can match these letters with regex’s character range [^\sa-zA-Z0-9].

Next, let’s create a test to check if it works:

String result = INPUT.replaceAll("[^\\sa-zA-Z0-9]", "");
assertEquals(EXPECTED, result);

The test passes if we execute it. The regex pattern is pretty straightforward. For those not familiar with the syntax, it may be helpful to note a couple of points:

[^…] – Not one of the characters in […]. For example, [^0-9] matches any non-digit.
\s – \s matches any whitespace characters, such as space and TAB.

Moreover, Java’s regex engine supports POSIX character classes. Therefore, we can directly use the \\p{Punct} character class to match any character in !”#$%&'()*+,-./:;<=>?@[\]^_`{|}~:

String result = INPUT.replaceAll("\\p{Punct}", "");
assertEquals(EXPECTED, result);

When we run the test above, it passes too.

4. When the Input Is a Unicode String

We’ve seen two approaches to removing punctuation from the input string successfully. If we take a closer look at the INPUT string, we realize that it consists of ASCII characters.

A question may come up – will the solutions still work if we receive a string like this:

static final String UNICODE_INPUT = "3 March März 三月 březen маршировать (!@#$%^&*{}[];':\")<>,.";

Apart from the digit ‘3‘, whitespace characters, and punctuation marks, this input includes the word “March” in English, German, Chinese, Czech, and Russian. So, unlike the previous INPUT string, the UNICODE_INPUT variable contains Unicode characters.

After removing punctuation, the expected result should look like this:

static final String UNICODE_EXPECTED = "3 March März 三月 březen маршировать ";

So next, let’s test if our two solutions still work with this input:

String result1 = UNICODE_INPUT.replaceAll("[^\\sa-zA-Z0-9]", "");
assertNotEquals(UNICODE_EXPECTED, result1);

The test above passes. But we should note that the assertion is assertNotEquals(). So the “removing [^\sa-zA-Z0-9]” approach doesn’t produce the expected result. Let’s see what result it actually produces:

String actualResult1 = "3 March Mrz  bezen  ";
assertEquals(actualResult1, result1);

So, all non-ASCII characters have been removed together with punctuation marks. Apparently, the “removing [^\sa-zA-Z0-9]” approach doesn’t work for Unicode strings.

But we can fix it by replacing the “a-zA-Z” range with “\p{L}“:

String result3 = UNICODE_INPUT.replaceAll("[^\\s\\p{L}0-9]", "");
assertEquals(UNICODE_EXPECTED, result3);

It’s worth mentioning that \p{L} matches any letter, including Unicode characters.

On the other hand, the “removing \p{Punct}” approach still works with Unicode inputs:

String result2 = UNICODE_INPUT.replaceAll("\\p{Punct}", "");
assertEquals(UNICODE_EXPECTED, result2);

This is because \\p{Punct} matches punctuation characters only.

5. Conclusion

In this article, we’ve learned how to remove punctuation from a string using the standard String.replaceAll() method:

String.replaceAll(“[^\\sa-zA-Z0-9]”, “”) – works only for input strings with ASCII characters
String.replaceAll(“\\p{Punct}”, “”) – works for both ASCII and Unicode strings
String.replaceAll(“[^\\s\\p{L}0-9]”, “”) – works for both ASCII and Unicode strings

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.