Partner – Microsoft – NPI EA (cat = Baeldung)
announcement - icon

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Partner – Microsoft – NPI EA (cat= Spring Boot)
announcement - icon

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Partner – Orkes – NPI EA (cat=Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag=Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – MongoDB – NPI EA (tag=MongoDB)
announcement - icon

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Partner – LambdaTest – NPI EA (cat=Testing)
announcement - icon

Accessibility testing is a crucial aspect to ensure that your application is usable for everyone and meets accessibility standards that are required in many countries.

By automating these tests, teams can quickly detect issues related to screen reader compatibility, keyboard navigation, color contrast, and other aspects that could pose a barrier to using the software effectively for people with disabilities.

Learn how to automate accessibility testing with Selenium and the LambdaTest cloud-based testing platform that lets developers and testers perform accessibility automation on over 3000+ real environments:

Automated Accessibility Testing With Selenium

1. Introduction

In this tutorial, we’ll learn how to use Apache Commons Compress to compress, archive, and extract files. We’ll also learn about its supported formats and some of its limitations.

2. What Is Apache Commons Compress

Apache Commons Compress is a library that creates a standard interface for the most widely used compression and archiving formats. It goes from the ubiquitous TAR, ZIP, and GZIP to less known but also commonly used formats, like BZIP2, XZ, LZMA, and Snappy.

2.1. Difference Between Compressors and Archivers

An archiver (such as TAR) bundles a directory structure into a single file, while a compressor takes a stream of bytes and makes them smaller, saving space. Some formats (like ZIP) can act as an archiver and a compressor but are considered archivers by the library.

We can check the supported archive formats by looking at some of the static fields of the ArchiveStreamFactory class provided by Commons Compress. Conversely, we can look at CompressorStreamFactory for supported compressor formats.

2.2. Commons Compress and Additional Dependencies

Let’s start by adding commons-compress in our project:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.26.1</version>
</dependency>

Out of the box, Commons Compress works with TAR, ZIP, BZIP2, CPIO, and GZIP. But, for other formats, we need additional dependencies. Let’s add XZ, 7z, and LZMA support:

<dependency>
    <groupId>org.tukaani</groupId>
    <artifactId>xz</artifactId>
    <version>1.9</version>
</dependency>

Finally, for LZ4 and ZSTD:

<dependency>
    <groupId>com.github.luben</groupId>
    <artifactId>zstd-jni</artifactId>
    <version>1.5.5-11</version>
</dependency>

With these, we’ll avoid errors when reading or writing files of these types.

3. Compressing and Decompressing Streams

While the library creates an abstraction for the operations these different formats have in common, they also have unique functionalities. We access these using specific implementations, like GzipCompressorInputStream and LZMACompressorInputStream. Instead, we’ll focus on CompressorStreamFactory, which helps us get an implementation without the specific class, which helps create format-agnostic code.

3.1. Compressing a File

We must pass the desired compressing format to the factory method when compressing a file. Commons Compress contains a FileNameUtils class that we’ll use to get our file extension and pass it as the format. Then, we open an output stream, get a compressor instance, and write the bytes from our Path to it:

public class CompressUtils {
    public static void compressFile(Path file, Path destination) {
        String format = FileNameUtils.getExtension(destination);

        try (OutputStream out = Files.newOutputStream(destination);
          BufferedOutputStream buffer = new BufferedOutputStream(out);
          CompressorOutputStream compressor = new CompressorStreamFactory()
            .createCompressorOutputStream(format, buffer)) {
            IOUtils.copy(Files.newInputStream(file), compressor);
        }
    }

    // ...
}

Let’s test it with a simple text file:

@Test
void givenFile_whenCompressing_thenCompressed() {
    Path destination = Paths.get("/tmp/simple.txt.gz");

    CompressUtils.compressFile(Paths.get("/tmp/simple.txt"), destination);

    assertTrue(Files.isRegularFile(destination));
}

Note that we’re using GZIP here, which is denoted by the “gz” extension. We can use any other supported format just by changing the extension of the desired destination. Also, we can use any file type as input.

3.2. Decompressing a Compressed File

Let’s decompress a file compressed with any of the supported formats. First, we need to open a buffered input stream for the file and create a compressor input stream (which detects the compression format by reading the first bytes of the file). Then, write the compressor input to an output stream, resulting in a decompressed file or archive:

public static void decompress(Path file, Path destination) {
    try (InputStream in = Files.newInputStream(file);
      BufferedInputStream inputBuffer = new BufferedInputStream(in);
      OutputStream out = Files.newOutputStream(destination);
      CompressorInputStream decompressor = new CompressorStreamFactory()
        .createCompressorInputStream(inputBuffer)) {
        IOUtils.copy(decompressor, out);
    }
}

Let’s test it with a “tar.gz” file, which indicates it’s a TAR archive compressed with GZIP:

@Test
void givenCompressedArchive_whenDecompressing_thenArchiveAvailable() {
    Path destination = Paths.get("/tmp/decompressed-archive.tar");

    CompressUtils.decompress("/tmp/archive.tar.gz", destination);

    assertTrue(Files.isRegularFile(destination));
}

Note that any combination of supported archivers and compressors would work here without changing any code. For instance, we could use an “archive.cpio.xz” file as input instead. We could even decompress a GZIP’ed ZIP file. Most importantly, this method isn’t exclusive to archive files. Any compressed file can be decompressed with it.

4. Creating and Manipulating Archives

To create archives, we need to specify the format we want. To simplify things, the Archiver class has a convenient method that archives a whole directory to a destination file:

public static void archive(Path directory, Path destination) {
    String format = FileNameUtils.getExtension(destination);
    new Archiver().create(format, destination, directory);
}

4.1. Combining an Archiver With a Compressor

We can also combine archivers and compressors to create a compressed archive in a single operation. To simplify this, we’ll consider the extension as the compressor format and the extension preceding it as the archiver format. Then, we open a buffered output stream for the resulting compressed archive, create a compressor based on our compression format, and instantiate an ArchiveOutputStream that consumes from the output of our compressor:

public static void archiveAndCompress(Path directory, Path destination) {
    String compressionFormat = FileNameUtils.getExtension(destination);
    String archiveFormat = FilenameUtils.getExtension(
      destination.getFileName().toString().replace("." + compressionFormat, ""));

    try (OutputStream archive = Files.newOutputStream(destination);
      BufferedOutputStream archiveBuffer = new BufferedOutputStream(archive);
      CompressorOutputStream compressor = new CompressorStreamFactory()
        .createCompressorOutputStream(compressionFormat, archiveBuffer);
      ArchiveOutputStream<?> archiver = new ArchiveStreamFactory()
        .createArchiveOutputStream(archiveFormat, compressor)) {
        new Archiver().create(archiver, directory);
    }
}

In the end, we still use the Archiver, but now using a version of create() that receives an ArchiveOutputStream.

4.2. Unarchiving an Archive

With the Expander class, we can unarchive our uncompressed archive in a single line:

public static void extract(Path archive, Path destination) {
    new Expander().expand(archive, destination);
}

We pass the archive file and the directory where we want our files extracted to. This utility method takes care of opening (and closing) an input stream, detecting the archive type, iterating over all entries in the archive, and copying them to the directory we chose.

4.3. Extracting an Entry From an Existing Archive

Let’s write a method that extracts a single entry from an archive instead of the whole content:

public static void extractOne(Path archivePath, String fileName, Path destinationDirectory) {
    try (InputStream input = Files.newInputStream(archivePath); 
      BufferedInputStream buffer = new BufferedInputStream(input); 
      ArchiveInputStream<?> archive = new ArchiveStreamFactory()
        .createArchiveInputStream(buffer)) {

        ArchiveEntry entry;
        while ((entry = archive.getNextEntry()) != null) {
            if (entry.getName().equals(fileName)) {
                Path outFile = destinationDirectory.resolve(fileName);
                Files.createDirectories(outFile.getParent());
                try (OutputStream os = Files.newOutputStream(outFile)) {
                    IOUtils.copy(archive, os);
                }
                break;
            }
        }
    }
}

After opening an ArchiveInputStream, we keep calling getNextEntry() on our archive until we find an entry with the same name. If necessary, any parent directories are created. Then, its contents are written in our destination directory. Note that the file name can denote a sub-directory inside the archive. Considering our archive contains a file named “some.txt” under “sub-directory”:

@Test
void givenExistingArchive_whenExtractingSingleEntry_thenFileExtracted() {
    Path archive = Paths.get("/tmp/archive.tar.gz");
    String targetFile = "sub-directory/some.txt";

    CompressUtils.extractOne(archive, targetFile, Paths.get("/tmp/"));

    assertTrue(Files.isRegularFile("/tmp/sub-directory/some.txt"));
}

4.4. Adding an Entry to an Existing Archive

Unfortunately, the library doesn’t give us an easy way to include a new entry into an existing archive. If we open the archive and call putArchiveEntry(), we’ll overwrite its contents. So, it’d also be necessary to rewrite all the existing entries before inserting a new one. Instead of creating a new method with the logic for this, we’ll reuse the methods we’ve created. We’ll extract the archive, copy the new file to the directory structure, archive the directory again, and then delete the old archive:

@Test
void givenExistingArchive_whenAddingSingleEntry_thenArchiveModified() {
    Path archive = Paths.get("/tmp/archive.tar");
    Path newArchive = Paths.get("/tmp/modified-archive.tar");
    Path tmpDir = Paths.get("/tmp/extracted-archive");

    Path newEntry = Paths.get("/tmp/new-entry.txt");

    CompressUtils.extract(archive, tmpDir);
    assertTrue(Files.isDirectory(tmpDir));

    Files.copy(newEntry, tmpDir.resolve(newEntry.getFileName()));
    CompressUtils.archive(tmpDir, newArchive);
    assertTrue(Files.isRegularFile(newArchive));

    FileUtils.deleteDirectory(tmpDir.toFile());
    Files.delete(archive);
    Files.move(newArchive, archive);
    assertTrue(Files.isRegularFile(archive));
}

This will destroy the old archive, so leaving a backup instead is advised.

4.5. Using a Concrete Implementation Directly for Exclusive Features

We can use the specific implementation class directly if we want exclusive features from each format. For example, instead of using ArchiveOutputStream, we’ll instantiate a ZipArchiveOutputStream so we can set its compression method and level directly:

public static void zip(Path file, Path destination) {
    try (InputStream input = Files.newInputStream(file);
      OutputStream output = Files.newOutputStream(destination);
      ZipArchiveOutputStream archive = new ZipArchiveOutputStream(output)) {
        archive.setMethod(ZipEntry.DEFLATED);
        archive.setLevel(Deflater.BEST_COMPRESSION);

        archive.putArchiveEntry(new ZipArchiveEntry(file.getFileName().toString()));
        IOUtils.copy(input, archive);
        archive.closeArchiveEntry();
    }
}

It requires more code than just using the Archiver but gives us more control.

5. Limitations

While Apache Commons Compress offers a versatile toolkit for file compression and archiving, it’s essential to acknowledge certain limitations and considerations. Firstly, while the library provides extensive support for various compression and archive formats, handling multi-volume archives may pose challenges that need careful consideration. Additionally, encoding issues may arise. Mainly when dealing with diverse file systems or non-standardized data.

Moreover, although the library provides comprehensive functionality, Apache suggests leveraging ZipFile for enhanced control in specific scenarios. Finally, the TAR format also has a dedicated page with considerations.

6. Conclusion

In this article, we saw how Apache Commons Compress is a valuable resource for efficient file compression and archiving solutions. By understanding its capabilities, limitations, and best practices, we can leverage this library effectively to streamline file management processes in a format-independent way.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

Partner – Microsoft – NPI EA (cat = Baeldung)
announcement - icon

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Partner – Microsoft – NPI EA (cat = Spring Boot)
announcement - icon

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Partner – Orkes – NPI EA (cat = Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag = Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Partner – MongoDB – NPI EA (tag=MongoDB)
announcement - icon

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Course – LS – NPI (cat=Java)
announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

eBook Jackson – NPI EA – 3 (cat = Jackson)