Java Top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Overview

In this short tutorial, we'll see how to do Base64 encoding and decoding of a PDF file using Java 8 and Apache Commons Codec.

But first, let's take a quick peek at the basics of Base64.

2. Basics of Base64

When sending data over the wire, we need to send it in the binary format. But if we send just 0's and 1's, different transport layer protocols may interpret them differently and our data might get corrupted in flight.

So, to have portability and common standards while transferring binary data, Base64 came in the picture.

Since the sender and receiver both understand and have agreed upon using the standard, the probability of our data getting lost or misinterpreted is greatly reduced.

Now let's see a couple of ways to apply this to a PDF.

3. Conversion Using Java 8

Starting with Java 8, we have a utility java.util.Base64 that provides encoders and decoders for the Base64 encoding scheme. It supports Basic, URL safe and MIME types as specified in RFC 4648 and RFC 2045.

3.1. Encoding

To convert a PDF into Base64, we first need to get it in bytes and pass it through java.util.Base64.Encoder‘s encode method:

byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE)); 
byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);

Here, IN_FILE is the path to our input PDF.

3.2. Streaming Encoding

For larger files or systems with limited memory, it's much more efficient to perform the encoding using a stream instead of reading all the data in memory. Let's look at how to accomplish this:

try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
  FileInputStream fis = new FileInputStream(IN_FILE)) {
    byte[] bytes = new byte[1024];
    int read;
    while ((read = fis.read(bytes)) > -1) {
        os.write(bytes, 0, read);
    }
}

Here, IN_FILE is the path to our input PDF, and OUT_FILE is the path to a file containing the Base64-encoded document. Instead of reading the entire PDF into memory and then encoding the full document in memory, we are reading up to 1Kb of data at a time and passing that data through the encoder into the OutputStream.

3.3. Decoding

At the receiving end, we get the encoded file.

So we now need to decode it to get back our original bytes and write them to a FileOutputStream to get the decoded PDF:

byte[] decoded = java.util.Base64.getDecoder().decode(encoded);

FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();

Here, OUT_FILE is the path to our PDF to be created.

4. Conversion Using Apache Commons

Next, we'll be using the Apache Commons Codec package to achieve the same. It's based on RFC 2045 and predates the Java 8 implementation we discussed earlier. So, when we need to support multiple JDK versions (including legacy ones) or vendors, this comes in handy as a third-party API.

4.1. Maven

To be able to use the Apache library, we need to add a dependency to our pom.xml:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.14</version>
</dependency>

The latest version of the above can be found on Maven Central.

4.2. Encoding

The steps are the same as for Java 8, except that this time, we pass on our original bytes to the encodeBase64 method of the org.apache.commons.codec.binary.Base64 class:

byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);

4.3. Streaming Encoding

Streaming encoding is not supported by this library.

4.4. Decoding

Again, we simply call the decodeBase64 method and write the result to a file:

byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);

FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();

5. Testing

Now we'll test our encoding and decoding using a simple JUnit test:

public class EncodeDecodeUnitTest {

    private static final String IN_FILE = // path to file to be encoded from;
    private static final String OUT_FILE = // path to file to be decoded into;
    private static byte[] inFileBytes;

    @BeforeClass
    public static void fileToByteArray() throws IOException {
        inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
    }

    @Test
    public void givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
        byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
        byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
        writeToFile(OUT_FILE, decoded);

        assertNotEquals(encoded.length, decoded.length);
        assertEquals(inFileBytes.length, decoded.length);
        assertArrayEquals(decoded, inFileBytes);
    }

    @Test
    public void givenJavaBase64_whenEncodedStream_thenDecodedStreamOK() throws IOException {
        try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
          FileInputStream fis = new FileInputStream(IN_FILE)) {
            byte[] bytes = new byte[1024];
            int read;
            while ((read = fis.read(bytes)) > -1) {
                os.write(bytes, 0, read);
            }
        }

        byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
        byte[] encodedOnDisk = Files.readAllBytes(Paths.get(OUT_FILE));
        assertArrayEquals(encoded, encodedOnDisk);

        byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
        byte[] decodedOnDisk = java.util.Base64.getDecoder().decode(encodedOnDisk);
        assertArrayEquals(decoded, decodedOnDisk);
    }

    @Test
    public void givenApacheCommons_givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
        byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);
        byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);

        writeToFile(OUT_FILE, decoded);

        assertNotEquals(encoded.length, decoded.length);
        assertEquals(inFileBytes.length, decoded.length);

        assertArrayEquals(decoded, inFileBytes);
    }

    private void writeToFile(String fileName, byte[] bytes) throws IOException {
        FileOutputStream fos = new FileOutputStream(fileName);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }
}

As we can see, we first read the input bytes in a @BeforeClass method, and in both our @Test methods, verified that:

  • encoded and decoded byte arrays are of different lengths
  • inFileBytes and decoded byte arrays are of the same length and have the same contents

Of course, we can also open up the decoded PDF file that we created and see that the contents are the same as the file we gave as input.

6. Conclusion

In this quick tutorial, we learned more about Java's Base64 utility.

We also saw code samples for converting a PDF into and from Base64 using Java 8 and Apache Commons Codec. Interestingly, the JDK implementation is much faster than the Apache one.

As always, source code is available over on GitHub.

Java bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
David
David
8 months ago

It is a bad practice to read files completely in memory, so it would be nicer if you showed a streaming version of the code instead.

Loredana Crusoveanu
7 months ago
Reply to  David

Hi David,
Good point! Even though that’s not the main focus of the article, it’s still worth to present how to handle larger files.
We’ll update the article.
Thanks for the suggestion!

Comments are closed on this article!