Extracting a Tar File in Java

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Introduction

In this tutorial, we’ll explore different Java libraries that we can use to extract tar archives. The tar format originated as a Unix-based utility to package files together, uncompressed. But today, it’s very common to compress tar archives with gzip. So, we’ll see how compressed vs. uncompressed tar archives affect our code.

2. Creating a Base Class for Implementations

To avoid boilerplate, let’s start with an abstract class we’ll use as the basis for our implementations. This class will define a single abstract method, untar(), which will perform the extraction:

public abstract class TarExtractor {

    private InputStream tarStream;
    private boolean gzip;
    private Path destination;

    // ...

    public abstract void untar() throws IOException;
}

Now, let’s define a couple of constructors for our base class. The primary constructor will receive a tar archive as an InputStream, whether the contents are compressed, and a Path to where the files will be extracted:

protected TarExtractor(InputStream in, boolean gzip, Path destination) throws IOException {
    this.tarStream = in;
    this.gzip = gzip;
    this.destination = destination;

    Files.createDirectories(destination);
}

Most importantly, we create the base directory structure for the files we’re extracting with Files.createDirectories(). This way, we don’t need to create the destination folder ourselves. For the sake of simplicity, we’re using a boolean to define if our archive is using gzip or not. So, we don’t need to write code to detect the actual file type by its contents.

Then, in our second constructor, we’ll accept a Path to a tar archive and determine if it’s compressed based on the file name. Note that this relies on the file name being correct:

protected TarExtractor(Path tarFile, Path destination) throws IOException {
    this(Files.newInputStream(tarFile), tarFile.endsWith("gz"), destination);
}

Finally, to simplify tests, we’ll create a class with a method that returns a tar archive from our resources folder:

public interface Resources {
    
    static InputStream tarGzFile() {
        return Resources.class.getResourceAsStream("/untar/test.tar.gz");
    }
}

This can be any tar archive compressed with gzip. We just put it in a method to avoid “stream closed” errors.

3. Extraction Using Apache Commons Compression

In our first implementation, we’ll use the Apache Commons library commons-compress:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.23.0</version>
</dependency>

The solution involves instantiating a TarArchiveInputStream, which will receive our archive stream. Then, we need to wrap it inside a GzipCompressorInputStream if using gzip:

public class TarExtractorCommonsCompress extends TarExtractor {

    protected TarExtractorCommonsCompress(InputStream in, boolean gzip, Path destination) throws IOException {
        super(in, gzip, destination);
    }

    public void untar() throws IOException {
        try (BufferedInputStream inputStream = new BufferedInputStream(getTarStream());
          TarArchiveInputStream tar = new TarArchiveInputStream(
          isGzip() ? new GzipCompressorInputStream(inputStream) : inputStream)) {
            ArchiveEntry entry;
            while ((entry = tar.getNextEntry()) != null) {
                Path extractTo = getDestination().resolve(entry.getName());
                if (entry.isDirectory()) {
                    Files.createDirectories(extractTo);
                } else {
                    Files.copy(tar, extractTo);
                }
            }
        }
    }
}

First, we iterate over our TarArchiveInputStream. For this, we must check if getNextEntry() returns an ArchiveEntry. Then, if it’s a directory, we create it relative to our destination folder. This way, we don’t get an error when writing a file inside it. Otherwise, we use Files.copy() from our tar to where we want to extract it.

Let’s test it by extracting the archive contents into an arbitrary folder:

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/commons-compress-gz");

    new TarExtractorCommonsCompress(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

If our archive weren’t using gzip, we’d only need to pass false when instantiating our TarExtractorCommonsCompress object. Also, note that GzipCompressorInputStream can extract formats other than gzip.

4. Extraction Using Apache Ant

With Apache ant, we can get close to a core Java implementation, as we can use GZIPInputStream from java.util in case our archive is using gzip:

<dependency>
    <groupId>org.apache.ant</groupId>
    <artifactId>ant</artifactId>
    <version>1.10.13</version>
</dependency>

We’ll have a very similar implementation:

public class TarExtractorAnt extends TarExtractor {

    // standard delegate constructor

    public void untar() throws IOException {
        try (TarInputStream tar = new TarInputStream(new BufferedInputStream(
          isGzip() ? new GZIPInputStream(getTarStream()) : getTarStream()))) {
            TarEntry entry;
            while ((entry = tar.getNextEntry()) != null) {
                Path extractTo = getDestination().resolve(entry.getName());
                if (entry.isDirectory()) {
                    Files.createDirectories(extractTo);
                } else {
                    Files.copy(tar, extractTo);
                }
            }
        }
    }
}

The logic is the same here, but we use TarInputStream and TarEntry from Apache Ant instead of TarArchiveInputStream and ArchiveEntry. We can test it the same way as the previous solution:

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/ant-gz");

    new TarExtractorAnt(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

5. Extraction Using Apache VFS

In our last example, we’ll use Apache commons-vfs2, which supports different file system schemes with a single API. One of them is tar archives:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-vfs2</artifactId>
    <version>2.9.0</version>
</dependency>

But, since we’re reading from an input stream, we’ll first need to save our stream to a temp file so we can generate a URI afterward:

public class TarExtractorVfs extends TarExtractor {

    // standard delegate constructor

    public void untar() throws IOException {
        Path tmpTar = Files.createTempFile("temp", isGzip() ? ".tar.gz" : ".tar");
        Files.copy(getTarStream(), tmpTar);

        // ...

        Files.delete(tmpTar);
    }
}

We’ll delete our temp file at the end of our extraction. Next, we’ll get an instance of a FileSystemManager and resolve our file URI into a FileObject, which we’ll then use to iterate over our archive contents:

FileSystemManager fsManager = VFS.getManager();
String uri = String.format("%s:file://%s", isGzip() ? "tgz" : "tar", tmpTar);
FileObject tar = fsManager.resolveFile(uri);

Note that, for resolveFile(), we construct our URI differently if we’re using gzip, prefixing it with “tgz” (which means tar+gzip) instead of “tar”. Then, at last, we iterate over our archive contents, extracting each file:

for (FileObject entry : tar) {
    Path extractTo = Paths.get(
      getDestination().toString(), entry.getName().getPath());

    if (entry.isReadable() && entry.getType() == FileType.FILE) {
        Files.createDirectories(extractTo.getParent());

        try (FileContent content = entry.getContent(); 
          InputStream stream = content.getInputStream()) {
            Files.copy(stream, extractTo);
        }
    }
}

And, because we might receive our items out of order, we’ll check if our entry is a file and call createDirectories() on its parent. This way, we don’t risk creating a file before creating its directory. Lastly, since the entry path is returned with a leading slash, we won’t use Paths.resolve() to create our destination files, like in previous implementations. Let’s test it:

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/vfs-gz");

    new TarExtractorVfs(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

This solution is only helpful if we already use VFS in our project, as it requires a little more code.

6. Conclusion

In this article, we learned how to extract tar archives using different libraries. Our implementations extended from a base class, reducing our code and making them simpler to use.

And as always, the source code is available over on GitHub.

Extracting a Tar File in Java

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Introduction

2. Creating a Base Class for Implementations

3. Extraction Using Apache Commons Compression

4. Extraction Using Apache Ant

5. Extraction Using Apache VFS

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Introduction

2. Creating a Base Class for Implementations

3. Extraction Using Apache Commons Compression

4. Extraction Using Apache Ant

5. Extraction Using Apache VFS

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: