eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Partner – LambdaTest – NPI EA (cat=Testing)
announcement - icon

Regression testing is an important step in the release process, to ensure that new code doesn't break the existing functionality. As the codebase evolves, we want to run these tests frequently to help catch any issues early on.

The best way to ensure these tests run frequently on an automated basis is, of course, to include them in the CI/CD pipeline. This way, the regression tests will execute automatically whenever we commit code to the repository.

In this tutorial, we'll see how to create regression tests using Selenium, and then include them in our pipeline using GitHub Actions:, to be run on the LambdaTest cloud grid:

>> How to Run Selenium Regression Tests With GitHub Actions

Course – LJB – NPI EA (cat = Core Java)
announcement - icon

Code your way through and build up a solid, practical foundation of Java:

>> Learn Java Basics

1. Introduction

In this tutorial, we’ll learn how to use Apache Commons Compress to compress, archive, and extract files. We’ll also learn about its supported formats and some of its limitations.

2. What Is Apache Commons Compress

Apache Commons Compress is a library that creates a standard interface for the most widely used compression and archiving formats. It goes from the ubiquitous TAR, ZIP, and GZIP to less known but also commonly used formats, like BZIP2, XZ, LZMA, and Snappy.

2.1. Difference Between Compressors and Archivers

An archiver (such as TAR) bundles a directory structure into a single file, while a compressor takes a stream of bytes and makes them smaller, saving space. Some formats (like ZIP) can act as an archiver and a compressor but are considered archivers by the library.

We can check the supported archive formats by looking at some of the static fields of the ArchiveStreamFactory class provided by Commons Compress. Conversely, we can look at CompressorStreamFactory for supported compressor formats.

2.2. Commons Compress and Additional Dependencies

Let’s start by adding commons-compress in our project:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.26.1</version>
</dependency>

Out of the box, Commons Compress works with TAR, ZIP, BZIP2, CPIO, and GZIP. But, for other formats, we need additional dependencies. Let’s add XZ, 7z, and LZMA support:

<dependency>
    <groupId>org.tukaani</groupId>
    <artifactId>xz</artifactId>
    <version>1.9</version>
</dependency>

Finally, for LZ4 and ZSTD:

<dependency>
    <groupId>com.github.luben</groupId>
    <artifactId>zstd-jni</artifactId>
    <version>1.5.5-11</version>
</dependency>

With these, we’ll avoid errors when reading or writing files of these types.

3. Compressing and Decompressing Streams

While the library creates an abstraction for the operations these different formats have in common, they also have unique functionalities. We access these using specific implementations, like GzipCompressorInputStream and LZMACompressorInputStream. Instead, we’ll focus on CompressorStreamFactory, which helps us get an implementation without the specific class, which helps create format-agnostic code.

3.1. Compressing a File

We must pass the desired compressing format to the factory method when compressing a file. Commons Compress contains a FileNameUtils class that we’ll use to get our file extension and pass it as the format. Then, we open an output stream, get a compressor instance, and write the bytes from our Path to it:

public class CompressUtils {
    public static void compressFile(Path file, Path destination) {
        String format = FileNameUtils.getExtension(destination);

        try (OutputStream out = Files.newOutputStream(destination);
          BufferedOutputStream buffer = new BufferedOutputStream(out);
          CompressorOutputStream compressor = new CompressorStreamFactory()
            .createCompressorOutputStream(format, buffer)) {
            IOUtils.copy(Files.newInputStream(file), compressor);
        }
    }

    // ...
}

Let’s test it with a simple text file:

@Test
void givenFile_whenCompressing_thenCompressed() {
    Path destination = Paths.get("/tmp/simple.txt.gz");

    CompressUtils.compressFile(Paths.get("/tmp/simple.txt"), destination);

    assertTrue(Files.isRegularFile(destination));
}

Note that we’re using GZIP here, which is denoted by the “gz” extension. We can use any other supported format just by changing the extension of the desired destination. Also, we can use any file type as input.

3.2. Decompressing a Compressed File

Let’s decompress a file compressed with any of the supported formats. First, we need to open a buffered input stream for the file and create a compressor input stream (which detects the compression format by reading the first bytes of the file). Then, write the compressor input to an output stream, resulting in a decompressed file or archive:

public static void decompress(Path file, Path destination) {
    try (InputStream in = Files.newInputStream(file);
      BufferedInputStream inputBuffer = new BufferedInputStream(in);
      OutputStream out = Files.newOutputStream(destination);
      CompressorInputStream decompressor = new CompressorStreamFactory()
        .createCompressorInputStream(inputBuffer)) {
        IOUtils.copy(decompressor, out);
    }
}

Let’s test it with a “tar.gz” file, which indicates it’s a TAR archive compressed with GZIP:

@Test
void givenCompressedArchive_whenDecompressing_thenArchiveAvailable() {
    Path destination = Paths.get("/tmp/decompressed-archive.tar");

    CompressUtils.decompress("/tmp/archive.tar.gz", destination);

    assertTrue(Files.isRegularFile(destination));
}

Note that any combination of supported archivers and compressors would work here without changing any code. For instance, we could use an “archive.cpio.xz” file as input instead. We could even decompress a GZIP’ed ZIP file. Most importantly, this method isn’t exclusive to archive files. Any compressed file can be decompressed with it.

4. Creating and Manipulating Archives

To create archives, we need to specify the format we want. To simplify things, the Archiver class has a convenient method that archives a whole directory to a destination file:

public static void archive(Path directory, Path destination) {
    String format = FileNameUtils.getExtension(destination);
    new Archiver().create(format, destination, directory);
}

4.1. Combining an Archiver With a Compressor

We can also combine archivers and compressors to create a compressed archive in a single operation. To simplify this, we’ll consider the extension as the compressor format and the extension preceding it as the archiver format. Then, we open a buffered output stream for the resulting compressed archive, create a compressor based on our compression format, and instantiate an ArchiveOutputStream that consumes from the output of our compressor:

public static void archiveAndCompress(Path directory, Path destination) {
    String compressionFormat = FileNameUtils.getExtension(destination);
    String archiveFormat = FilenameUtils.getExtension(
      destination.getFileName().toString().replace("." + compressionFormat, ""));

    try (OutputStream archive = Files.newOutputStream(destination);
      BufferedOutputStream archiveBuffer = new BufferedOutputStream(archive);
      CompressorOutputStream compressor = new CompressorStreamFactory()
        .createCompressorOutputStream(compressionFormat, archiveBuffer);
      ArchiveOutputStream<?> archiver = new ArchiveStreamFactory()
        .createArchiveOutputStream(archiveFormat, compressor)) {
        new Archiver().create(archiver, directory);
    }
}

In the end, we still use the Archiver, but now using a version of create() that receives an ArchiveOutputStream.

4.2. Unarchiving an Archive

With the Expander class, we can unarchive our uncompressed archive in a single line:

public static void extract(Path archive, Path destination) {
    new Expander().expand(archive, destination);
}

We pass the archive file and the directory where we want our files extracted to. This utility method takes care of opening (and closing) an input stream, detecting the archive type, iterating over all entries in the archive, and copying them to the directory we chose.

4.3. Extracting an Entry From an Existing Archive

Let’s write a method that extracts a single entry from an archive instead of the whole content:

public static void extractOne(Path archivePath, String fileName, Path destinationDirectory) {
    try (InputStream input = Files.newInputStream(archivePath); 
      BufferedInputStream buffer = new BufferedInputStream(input); 
      ArchiveInputStream<?> archive = new ArchiveStreamFactory()
        .createArchiveInputStream(buffer)) {

        ArchiveEntry entry;
        while ((entry = archive.getNextEntry()) != null) {
            if (entry.getName().equals(fileName)) {
                Path outFile = destinationDirectory.resolve(fileName);
                Files.createDirectories(outFile.getParent());
                try (OutputStream os = Files.newOutputStream(outFile)) {
                    IOUtils.copy(archive, os);
                }
                break;
            }
        }
    }
}

After opening an ArchiveInputStream, we keep calling getNextEntry() on our archive until we find an entry with the same name. If necessary, any parent directories are created. Then, its contents are written in our destination directory. Note that the file name can denote a sub-directory inside the archive. Considering our archive contains a file named “some.txt” under “sub-directory”:

@Test
void givenExistingArchive_whenExtractingSingleEntry_thenFileExtracted() {
    Path archive = Paths.get("/tmp/archive.tar.gz");
    String targetFile = "sub-directory/some.txt";

    CompressUtils.extractOne(archive, targetFile, Paths.get("/tmp/"));

    assertTrue(Files.isRegularFile("/tmp/sub-directory/some.txt"));
}

4.4. Adding an Entry to an Existing Archive

Unfortunately, the library doesn’t give us an easy way to include a new entry into an existing archive. If we open the archive and call putArchiveEntry(), we’ll overwrite its contents. So, it’d also be necessary to rewrite all the existing entries before inserting a new one. Instead of creating a new method with the logic for this, we’ll reuse the methods we’ve created. We’ll extract the archive, copy the new file to the directory structure, archive the directory again, and then delete the old archive:

@Test
void givenExistingArchive_whenAddingSingleEntry_thenArchiveModified() {
    Path archive = Paths.get("/tmp/archive.tar");
    Path newArchive = Paths.get("/tmp/modified-archive.tar");
    Path tmpDir = Paths.get("/tmp/extracted-archive");

    Path newEntry = Paths.get("/tmp/new-entry.txt");

    CompressUtils.extract(archive, tmpDir);
    assertTrue(Files.isDirectory(tmpDir));

    Files.copy(newEntry, tmpDir.resolve(newEntry.getFileName()));
    CompressUtils.archive(tmpDir, newArchive);
    assertTrue(Files.isRegularFile(newArchive));

    FileUtils.deleteDirectory(tmpDir.toFile());
    Files.delete(archive);
    Files.move(newArchive, archive);
    assertTrue(Files.isRegularFile(archive));
}

This will destroy the old archive, so leaving a backup instead is advised.

4.5. Using a Concrete Implementation Directly for Exclusive Features

We can use the specific implementation class directly if we want exclusive features from each format. For example, instead of using ArchiveOutputStream, we’ll instantiate a ZipArchiveOutputStream so we can set its compression method and level directly:

public static void zip(Path file, Path destination) {
    try (InputStream input = Files.newInputStream(file);
      OutputStream output = Files.newOutputStream(destination);
      ZipArchiveOutputStream archive = new ZipArchiveOutputStream(output)) {
        archive.setMethod(ZipEntry.DEFLATED);
        archive.setLevel(Deflater.BEST_COMPRESSION);

        archive.putArchiveEntry(new ZipArchiveEntry(file.getFileName().toString()));
        IOUtils.copy(input, archive);
        archive.closeArchiveEntry();
    }
}

It requires more code than just using the Archiver but gives us more control.

5. Limitations

While Apache Commons Compress offers a versatile toolkit for file compression and archiving, it’s essential to acknowledge certain limitations and considerations. Firstly, while the library provides extensive support for various compression and archive formats, handling multi-volume archives may pose challenges that need careful consideration. Additionally, encoding issues may arise. Mainly when dealing with diverse file systems or non-standardized data.

Moreover, although the library provides comprehensive functionality, Apache suggests leveraging ZipFile for enhanced control in specific scenarios. Finally, the TAR format also has a dedicated page with considerations.

6. Conclusion

In this article, we saw how Apache Commons Compress is a valuable resource for efficient file compression and archiving solutions. By understanding its capabilities, limitations, and best practices, we can leverage this library effectively to streamline file management processes in a format-independent way.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

Course – LS – NPI (cat=Java)
announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

eBook Jackson – NPI EA – 3 (cat = Jackson)