Java Convert PDF to Base64

Last updated: January 8, 2024

Written by: Sampada Wagde

Reviewed by: Michal Aibin

Java IO

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

In distributed systems, managing multi-step processes (e.g., validating a driver, calculating fares, notifying users) can be difficult. We need to manage state, scattered retry logic, and maintain context when services fail.

Dapr Workflows solves this via Durable Execution which includes automatic state persistence, replaying workflows after failures and built-in resilience through retries, timeouts and error handling.

In this tutorial, we'll see how to orchestrate a multi-step flow for a ride-hailing application by integrating Dapr Workflows and Spring Boot:

>> Dapr Workflows With PubSub

1. Overview

In this short tutorial, we’ll see how to do Base64 encoding and decoding of a PDF file using Java 8 and Apache Commons Codec.

But first, let’s take a quick peek at the basics of Base64.

2. Basics of Base64

When sending data over the wire, we need to send it in the binary format. But if we send just 0’s and 1’s, different transport layer protocols may interpret them differently and our data might get corrupted in flight.

So, to have portability and common standards while transferring binary data, Base64 came in the picture.

Since the sender and receiver both understand and have agreed upon using the standard, the probability of our data getting lost or misinterpreted is greatly reduced.

Now let’s see a couple of ways to apply this to a PDF.

3. Conversion Using Java 8

Starting with Java 8, we have a utility java.util.Base64 that provides encoders and decoders for the Base64 encoding scheme. It supports Basic, URL safe and MIME types as specified in RFC 4648 and RFC 2045.

3.1. Encoding

To convert a PDF into Base64, we first need to get it in bytes and pass it through java.util.Base64.Encoder‘s encode method:

byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE)); 
byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);

Here, IN_FILE is the path to our input PDF.

3.2. Streaming Encoding

For larger files or systems with limited memory, it’s much more efficient to perform the encoding using a stream instead of reading all the data in memory. Let’s look at how to accomplish this:

try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
  FileInputStream fis = new FileInputStream(IN_FILE)) {
    byte[] bytes = new byte[1024];
    int read;
    while ((read = fis.read(bytes)) > -1) {
        os.write(bytes, 0, read);
    }
}

Here, IN_FILE is the path to our input PDF, and OUT_FILE is the path to a file containing the Base64-encoded document. Instead of reading the entire PDF into memory and then encoding the full document in memory, we are reading up to 1Kb of data at a time and passing that data through the encoder into the OutputStream.

3.3. Decoding

At the receiving end, we get the encoded file.

So we now need to decode it to get back our original bytes and write them to a FileOutputStream to get the decoded PDF:

byte[] decoded = java.util.Base64.getDecoder().decode(encoded);

FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();

Here, OUT_FILE is the path to our PDF to be created.

4. Conversion Using Apache Commons

Next, we’ll be using the Apache Commons Codec package to achieve the same. It’s based on RFC 2045 and predates the Java 8 implementation we discussed earlier. So, when we need to support multiple JDK versions (including legacy ones) or vendors, this comes in handy as a third-party API.

4.1. Maven

To be able to use the Apache library, we need to add a dependency to our pom.xml:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.16.0</version>
</dependency>

The latest version of the above can be found on Maven Central.

4.2. Encoding

The steps are the same as for Java 8, except that this time, we pass on our original bytes to the encodeBase64 method of the org.apache.commons.codec.binary.Base64 class:

byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);

4.3. Streaming Encoding

Streaming encoding is not supported by this library.

4.4. Decoding

Again, we simply call the decodeBase64 method and write the result to a file:

byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);

FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();

5. Testing

Now we’ll test our encoding and decoding using a simple JUnit test:

public class EncodeDecodeUnitTest {

    private static final String IN_FILE = // path to file to be encoded from;
    private static final String OUT_FILE = // path to file to be decoded into;
    private static byte[] inFileBytes;

    @BeforeClass
    public static void fileToByteArray() throws IOException {
        inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
    }

    @Test
    public void givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
        byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
        byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
        writeToFile(OUT_FILE, decoded);

        assertNotEquals(encoded.length, decoded.length);
        assertEquals(inFileBytes.length, decoded.length);
        assertArrayEquals(decoded, inFileBytes);
    }

    @Test
    public void givenJavaBase64_whenEncodedStream_thenDecodedStreamOK() throws IOException {
        try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
          FileInputStream fis = new FileInputStream(IN_FILE)) {
            byte[] bytes = new byte[1024];
            int read;
            while ((read = fis.read(bytes)) > -1) {
                os.write(bytes, 0, read);
            }
        }

        byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
        byte[] encodedOnDisk = Files.readAllBytes(Paths.get(OUT_FILE));
        assertArrayEquals(encoded, encodedOnDisk);

        byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
        byte[] decodedOnDisk = java.util.Base64.getDecoder().decode(encodedOnDisk);
        assertArrayEquals(decoded, decodedOnDisk);
    }

    @Test
    public void givenApacheCommons_givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
        byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);
        byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);

        writeToFile(OUT_FILE, decoded);

        assertNotEquals(encoded.length, decoded.length);
        assertEquals(inFileBytes.length, decoded.length);

        assertArrayEquals(decoded, inFileBytes);
    }

    private void writeToFile(String fileName, byte[] bytes) throws IOException {
        FileOutputStream fos = new FileOutputStream(fileName);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }
}

As we can see, we first read the input bytes in a @BeforeClass method, and in both our @Test methods, verified that:

encoded and decoded byte arrays are of different lengths
inFileBytes and decoded byte arrays are of the same length and have the same contents

Of course, we can also open up the decoded PDF file that we created and see that the contents are the same as the file we gave as input.

6. Conclusion

In this quick tutorial, we learned more about Java’s Base64 utility.

We also saw code samples for converting a PDF into and from Base64 using Java 8 and Apache Commons Codec. Interestingly, the JDK implementation is much faster than the Apache one.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

REST with Spring Boot

Learn Spring Security

Learn Spring

Learn Spring Data JPA

View All Spring Courses

Learn JUnit

Learn Maven

Learn Hibernate JPA

Learn Mockito

View All Courses

Full Archive

Baeldung Ebooks

About Baeldung