Expand Authors Top

If you have a few years of experience in the Java ecosystem and you’d like to share that with the community, have a look at our Contribution Guidelines.

November Discount Launch 2022 – Top
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:


November Discount Launch 2022 – TEMP TOP (NPI)
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:


Expanded Audience – Frontegg – Security (partner)
announcement - icon User management is very complex, when implemented properly. No surprise here.

Not having to roll all of that out manually, but instead integrating a mature, fully-fledged solution - yeah, that makes a lot of sense.
That's basically what Frontegg is - User Management for your application. It's focused on making your app scalable, secure and enjoyable for your users.
From signup to authentication, it supports simple scenarios all the way to complex and custom application logic.

Have a look:

>> Elegant User Management, Tailor-made for B2B SaaS

1. Overview

When dealing with Strings in Java, we sometimes need to encode them into a specific charset.

Further reading:

Guide to Character Encoding

Explore character encoding in Java and learn about common pitfalls.

Guide to Java URL Encoding/Decoding

The article discusses URL encoding in Java, some pitfalls, and how to avoid them.

Java Base64 Encoding and Decoding

How to do Base64 encoding and decoding in Java, using the new APIs introduced in Java 8 as well as Apache Commons.

This tutorial is a practical guide showing different ways to encode a String to the UTF-8 charset.

For a more technical deep-dive, see our Guide to Character Encoding.

2. Defining the Problem

To showcase the Java encoding, we'll work with the German String “Entwickeln Sie mit Vergnügen”:

String germanString = "Entwickeln Sie mit Vergnügen";
byte[] germanBytes = germanString.getBytes();

String asciiEncodedString = new String(germanBytes, StandardCharsets.US_ASCII);

assertNotEquals(asciiEncodedString, germanString);

This String encoded using US_ASCII gives us the value “Entwickeln Sie mit Vergn?gen” when printed because it doesn't understand the non-ASCII ü character.

But when we convert an ASCII-encoded String that uses all English characters to UTF-8, we get the same string:

String englishString = "Develop with pleasure";
byte[] englishBytes = englishString.getBytes();

String asciiEncondedEnglishString = new String(englishBytes, StandardCharsets.US_ASCII);

assertEquals(asciiEncondedEnglishString, englishString);

Let's see what happens when we use the UTF-8 encoding.

3. Encoding With Core Java

Let's start with the core library.

Strings are immutable in Java, which means we cannot change a String character encoding. To achieve what we want, we need to copy the bytes of the String and then create a new one with the desired encoding.

First, we get the String bytes, and then we create a new one using the retrieved bytes and the desired charset:

String rawString = "Entwickeln Sie mit Vergnügen";
byte[] bytes = rawString.getBytes(StandardCharsets.UTF_8);

String utf8EncodedString = new String(bytes, StandardCharsets.UTF_8);

assertEquals(rawString, utf8EncodedString);

4. Encoding With Java 7 StandardCharsets

Alternatively, we can use the StandardCharsets class introduced in Java 7 to encode the String.

First, we'll decode the String into bytes, and second, we'll encode the String to UTF-8:

String rawString = "Entwickeln Sie mit Vergnügen";
ByteBuffer buffer = StandardCharsets.UTF_8.encode(rawString); 

String utf8EncodedString = StandardCharsets.UTF_8.decode(buffer).toString();

assertEquals(rawString, utf8EncodedString);

5. Encoding With Commons-Codec

Besides using core Java, we can alternatively use Apache Commons Codec to achieve the same results.

Apache Commons Codec is a handy package containing simple encoders and decoders for various formats.

First, let's start with the project configuration.

When using Maven, we have to add the commons-codec dependency to our pom.xml:


Then, in our case, the most interesting class is StringUtils, which provides methods to encode Strings.

Using this class, getting a UTF-8 encoded String is pretty straightforward:

String rawString = "Entwickeln Sie mit Vergnügen"; 
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);

assertEquals(rawString, utf8EncodedString);

6. Conclusion

Encoding a String into UTF-8 isn't difficult, but it's not that intuitive. This article presents three ways of doing it, using either core Java or Apache Commons Codec.

As always, the code samples can be found over on GitHub.

November Discount Launch 2022 – Bottom
We’re finally running a Black Friday launch. All Courses are 30% off until next Friday:


Generic footer banner
Comments are closed on this article!