Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

There are occasions when we have to compress files to pack multiple files into a single archive for convenient transfer and saving space. For this use case, Zip is a widely used archive file format in compression.

Java provides a standard set of classes like ZipFile and ZipInputStream to access zip files. In this tutorial, we’ll learn how to use them to read zip files. Also, we’ll explore their functional differences and evaluate their performance.

2. Create a Zip File

Before we dive into the code for reading zip files, let us review the process of creating a zip file first.

In the following code snippet, we’ll have two variables. data stores the content to be compressed, and file represents our destination file:

String data = "..."; // a very long String

try (BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(file));
  ZipOutputStream zos = new ZipOutputStream(bos)) {
    ZipEntry zipEntry = new ZipEntry("zip-entry.txt");
    zos.putNextEntry(zipEntry);
    zos.write(data);
    zos.closeEntry();
}

This snippet archives the data to a zip entry called zip-entry.txt and then writes the entry to the target file.

3. Read via ZipFile

First, let’s see how we read all entries from a zip file via the ZipFile class:

try (ZipFile zipFile = new ZipFile(compressedFile)) {
    Enumeration<? extends ZipEntry> zipEntries = zipFile.entries();
    while (zipEntries.hasMoreElements())  {
        ZipEntry zipEntry = zipEntries.nextElement();
        try (InputStream inputStream = new BufferedInputStream(zipFile.getInputStream(zipEntry))) {
            // Read data from InputStream
        }
    }
}

We create an instance of ZipFile to read the compressed file. ZipFile.entries() returns all zip entries in the zip file. We can then obtain the InputStream from the ZipEntry to read the content of it.

In addition to entries(), ZipFile has a method getEntry(…), which allows us to randomly access a specific ZipEntry based on the entry name:

ZipEntry zipEntry = zipFile.getEntry("str-data-10.txt");
try (InputStream inputStream = new BufferedInputStream(zipFile.getInputStream(zipEntry))) {
    // Read data from InputStream
}

4. Read via ZipInputStream

Next, we’ll go through a typical example of reading all entries from a zip file via the ZipInputStream:

try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(compressedFile));
  ZipInputStream zipInputStream = new ZipInputStream(bis)) {
    ZipEntry zipEntry;
    while ((zipEntry = zipInputStream.getNextEntry()) != null) {
        // Read data from ZipInputStream
    }
}

We create a ZipInputStream to wrap the source of data, which is compressedFile in our case. After that, we iterate the ZipInputStream by getNextEntry().

Within the loop, we read the data of each ZipEntry by reading the data from ZipIputStream. Once we complete the reading of an entry, then we call getNextEntry() again to signify we’re going to read the next entry.

5. Functional Differences

Although both classes can serve the purpose of reading entries from a zip file, they have two distinct functional differences.

5.1. Access Type

The major difference between them is that ZipFile supports random access, whereas ZipInputStream supports sequential access only.

In ZipFile, we can extract a specific entry by calling ZipFile.getEntry(…). This characteristic is particularly favorable when we need only a specific entry within ZipFile. If we want to achieve the same in ZipInputStream, we have to loop through each ZipEntry until we find a match during the iteration.

5.2. Data Source

ZipFile requires the data source to be a physical file, whereas ZipInputStream only requires an InputStream. There may be a scenario that our data isn’t a file. For example, our data is coming from a network stream. In such a case, we must convert the whole InputStream to a file before we can process it using ZipFile.

6. Performance Comparison

We’ve gone through the functional differences between ZipFile and ZipInputStream. Now, let’s explore further differences in terms of performance.

We’ll use JMH (Java Microbenchmark Harness) to capture the processing speed between these two. JMH is a framework designed for measuring the performance of code snippets.

Before we proceed to the benchmarking, we’ve to include the following Maven dependency in our pom.xml:

<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>1.37</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>1.37</version>
</dependency>

The latest version of JMH Core and Annotation can be found in Maven Central.

6.1. Read All Entries

In this experiment, we aim to assess the performance of reading all entries from a zip file. In our setup, we have a zip file containing 10 entries, and each comprises 200KB of data. We’ll read them via ZipFile and ZipInputStream separately:

Class Running time (in milliseconds)
ZipFile 11.072
ZipInputStream 11.642

From the results, we cannot see any significant performance difference between both classes. The difference is within 10% in terms of running time. They both demonstrated comparable efficiency when reading all entries from a zip file.

6.2. Read the Last Entry

Next, we’ll specifically target reading the last entry from the same zip file:

Class Running time (in milliseconds)
ZipFile 1.016
ZipInputStream 12.830

There is a huge difference between them this time. ZipFile requires only 1/10 of the time to read a single entry out of 10 in comparison to reading all entries, while ZipInputStream spends pretty much the same amount of time.

We can observe the ZipInputStream reads the entries sequentially from the results. The input stream must be read through from the beginning of the zip file until the target entry is located, whereas ZipFile allows jumping to the target entry without reading the entire file.

The results indicate the importance of choosing ZipFile over ZipInputStream, particularly when dealing with a small number of entries within a large set of entries.

7. Conclusion

In software development, it’s common to deal with compressed files using zip. Java offers two different classes, ZipFile and ZipIputStream, to read zip files.

In this article, we’ve explored their usage and functional differences. We also evaluated the performance between them.

The choice between them depends on our requirements. We’ll choose ZipFile when we’re dealing with a limited number of entries within a large zip archive to ensure optimal performance. In contrast, we’ll choose ZipInputStream if our source of data isn’t a file.

As always, the full source code of our examples can be found over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.