Java Top

I just announced the new Spring Boot 2 material, coming in REST With Spring:

>> CHECK OUT THE COURSE

1. Introduction

We frequently need to convert between String and byte array in Java. In this tutorial, we’ll examine these operations in detail.

First, we’ll look at various ways to convert a String to a byte array. Then, we’ll look at similar operations in reverse.

2. Converting String to Byte Array

String is stored as an array of Unicode characters in Java. To convert it to a byte array, we translate the sequence of Characters into a sequence of bytes. For this translation, we use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes.

We refer to the above process as encoding.

We can encode a String into a byte array in Java in multiple ways. Let’s look at each of them in detail with examples.

2.1. Using String.getBytes()

The String class provides three overloaded getBytes methods to encode a String into a byte array:

Firstly, let’s encode a string using the platform’s default charset:

String inputString = "Hello World!";
byte[] byteArrray = inputString.getBytes();

The above method is platform-dependent as it uses the platform’s default charset. We can get this charset by calling Charset.defaultCharset().

Secondly, let’s encode a string using a named charset:

@Test
public void whenGetBytesWithNamedCharset_thenOK() 
  throws UnsupportedEncodingException {
    String inputString = "Hello World!";
    String charsetName = "IBM01140";

    byte[] byteArrray = inputString.getBytes("IBM01140");
    
    assertArrayEquals(
      new byte[] { -56, -123, -109, -109, -106, 64, -26,
        -106, -103, -109, -124, 90 },
      byteArrray);
}

This method throws UnsupportedEncodingException if the named charset is not supported.

The behavior of the above two versions is undefined if the input contains characters which are not supported by the charset. In contrast, the third version uses the charset’s default replacement byte array to encode unsupported input.

Next, let’s call the third version of the getBytes() method and pass an instance of Charset:

@Test
public void whenGetBytesWithCharset_thenOK() {
    String inputString = "Hello ਸੰਸਾਰ!";
    Charset charset = Charset.forName("ASCII");

    byte[] byteArrray = inputString.getBytes(charset);

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63,
        63, 63, 33 },
      byteArrray);
}

Here, we are using the factory method Charset.forName to get an instance of the Charset. This method throws a runtime exception if the name of the requested charset is invalid. It also throws a runtime exception if the charset is supported in the current JVM.

However, some charsets are guaranteed to be available on every Java platform. The StandardCharsets class defines constants for these charsets.

Finally, let’s encode using one of the standard charsets:

@Test
public void whenGetBytesWithStandardCharset_thenOK() {
    String inputString = "Hello World!";
    Charset charset = StandardCharsets.UTF_16;

    byte[] byteArrray = inputString.getBytes(charset);

    assertArrayEquals(
      new byte[] { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
        111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 },
      byteArrray);
}

Thus, we complete the review of the various getBytes versions. Next, let’s look into the method provided by Charset itself.

2.2. Using Charset.encode()

The Charset class provides encode(), a convenient method that encodes Unicode characters into bytes. This method always replaces invalid input and unmappable-characters using the charset’s default replacement byte array.

Let’s use the encode method to convert a String into a byte array:

@Test
public void whenEncodeWithCharset_thenOK() {
    String inputString = "Hello ਸੰਸਾਰ!";
    Charset charset = StandardCharsets.US_ASCII;

    byte[] byteArrray = charset.encode(inputString).array();

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63, 63, 63, 33 },
      byteArrray);
}

As we can see above, unsupported characters have been replaced with the charset’s default replacement byte 63.

The approaches used so far use the CharsetEncoder class internally to perform encoding. Let’s examine this class in the next section.

2.3. CharsetEncoder

CharsetEncoder transforms Unicode characters into a sequence of bytes for a given charset. Moreover, it provides fine-grained control over the encoding process.

Let’s use this class to convert a String into a byte array:

@Test
public void whenUsingCharsetEncoder_thenOK()
  throws CharacterCodingException {
    String inputString = "Hello ਸੰਸਾਰ!";
    CharsetEncoder encoder = StandardCharsets.US_ASCII.newEncoder();
    encoder.onMalformedInput(CodingErrorAction.IGNORE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .replaceWith(new byte[] { 0 });

    byte[] byteArrray = encoder.encode(CharBuffer.wrap(inputString))
                          .array();

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 0, 0, 0, 0, 0, 33 },
      byteArrray);
}

Here, we’re creating an instance of CharsetEncoder by calling the newEncoder method on a Charset object.

Then, we are specifying actions for error conditions by calling the onMalformedInput() and onUnmappableCharacter() methodsWe can specify the following actions:

  • IGNORE – drop the erroneous input
  • REPLACE – replace the erroneous input
  • REPORT – report the error by returning a CoderResult object or throwing a CharacterCodingException

Furthermore, we are using the replaceWith() method to specify the replacement byte array .

Thus, we complete the review of various approaches to convert a String to a byte array. Let’s next look at the reverse operation.

3. Converting Byte Array to String

We refer to the process of converting a byte array to a String as decoding. Similar to encoding, this process requires a Charset.

However, we cannot just use any charset for decoding a byte array. We should use the charset that was used to encode the String into the byte array.

We can convert a byte array to a String in many ways. Let’s examine each of them in detail.

3.1. Using the String Constructor

The String class has few constructors which take a byte array as input. They are all similar to the getBytes method but work in reverse.

First, let’s convert a byte array to String using the platform’s default charset:

@Test
public void whenStringConstructorWithDefaultCharset_thenOK() {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
      108, 100, 33 };
    
    String string = new String(byteArrray);
    
    assertNotNull(string);
}

Note that we don’t assert anything here about the contents of the decoded string. This is because it may decode to something different, depending on the platform’s default charset.

For this reason, we should generally avoid this method.

Secondly, let’s use a named charset for decoding:

@Test
public void whenStringConstructorWithNamedCharset_thenOK()
    throws UnsupportedEncodingException {
    String charsetName = "IBM01140";
    byte[] byteArrray = { -56, -123, -109, -109, -106, 64, -26, -106,
      -103, -109, -124, 90 };

    String string = new String(byteArrray, charsetName);
        
    assertEquals("Hello World!", string);
}

This method throws an exception if the named charset is not available on the JVM.

Thirdly, let’s use a Charset object to do decoding:

@Test
public void whenStringConstructorWithCharSet_thenOK() {
    Charset charset = Charset.forName("UTF-8");
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
      108, 100, 33 };

    String string = new String(byteArrray, charset);

    assertEquals("Hello World!", string);
}

Finally, let’s use a standard Charset for the same:

@Test
public void whenStringConstructorWithStandardCharSet_thenOK() {
    Charset charset = StandardCharsets.UTF_16;
        
    byte[] byteArrray = { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
      111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 };

    String string = new String(byteArrray, charset);

    assertEquals("Hello World!", string);
}

So far, we have converted a byte array into a String using the constructor. Let’s now look into the other approaches.

3.2. Using Charset.decode()

The Charset class provides the decode() method that converts a ByteBuffer to String:

@Test
public void whenDecodeWithCharset_thenOK() {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111,
      114, 108, -63, 33 };
    Charset charset = StandardCharsets.US_ASCII;
    String string = charset.decode(ByteBuffer.wrap(byteArrray))
                      .toString();

    assertEquals("Hello �orl�!", string);
}

Here, the invalid input is replaced with the default replacement character for the charset.

3.3. CharsetDecoder

All of the previous approaches for decoding internally use the CharsetDecoder class. We can use this class directly for fine-grained control on the decoding process:

@Test
public void whenUsingCharsetDecoder_thenOK()
  throws CharacterCodingException {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111, 114,
      108, -63, 33 };
    CharsetDecoder decoder = StandardCharsets.US_ASCII.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .replaceWith("?");

    String string = decoder.decode(ByteBuffer.wrap(byteArrray))
                      .toString();

    assertEquals("Hello ?orl?!", string);
}

Here, we are replacing invalid inputs and unsupported characters with “?”.

If we want to be informed in case of invalid inputs, we can change the decoder as:

decoder.onMalformedInput(CodingErrorAction.REPORT)
  .onUnmappableCharacter(CodingErrorAction.REPORT)

4. Conclusion

In this article, we investigated multiple ways to convert String to a byte array and reverse. We should choose the appropriate method based on the input data as well as the level of control required for invalid inputs.

As usual, the full source code can be found over on GitHub.

Java bottom

I just announced the new Spring Boot 2 material, coming in REST With Spring:

>> CHECK OUT THE LESSONS

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
Andrea Ligios Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Andrea Ligios
Guest
Andrea Ligios

This is one of the questions I always ask during my technical interviews. Mandatory.