Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

We may wish to work with compressed files in Java. A common format is .gz, as generated by the GZIP utility.

Java has a built-in library for reading .gz files, which are commonly used for logs.

In this tutorial, we’ll explore reading compressed (.gz) files line by line in Java using the GZIPInputStream class.

2. Reading a GZipped File

Let’s imagine we want to read the contents of a file into a List. First, we need to find the file on our path:

String filePath = Objects.requireNonNull(Main.class.getClassLoader().getResource("myFile.gz")).getFile();

Next, let’s get ready to read from this file into an empty list:

List<String> lines = new ArrayList<>();
try (FileInputStream fileInputStream = new FileInputStream(filePath);
     GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
     InputStreamReader inputStreamReader = new InputStreamReader(gzipInputStream);
     BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {

    //...
}

Inside our try-with-resources block, we’ve defined a FileInputStream object for reading the GZIP file. Then, we have a GZIPInputStream that decompresses data from the GZIP file. Finally, there’s a BufferedReader to read its lines.

Now, we can loop through the file to read line by line:

String line;
while ((line = bufferedReader.readLine()) != null) {
    lines.add(line);
}

3. Handling Large GZipped Files With Java Stream API

When confronted with large GZIP-compressed files, we may not have enough memory to load the whole file. However, the streaming approach allows us to process the content line-by-line as it’s read from the stream.

3.1. Standalone Method

Let’s build a routine to collect lines from our file that match a specific substring:

try (InputStream inputStream = new FileInputStream(filePath);
     GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
     InputStreamReader inputStreamReader = new InputStreamReader(gzipInputStream);
     BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {

     return bufferedReader.lines().filter(line -> line.contains(toFind)).collect(toList());
}

This approach utilizes the lines() method to create a stream of lines from the file. Then, the subsequent filter() operation selects the lines of interest and collects them into a list with collect().

The use of try-with-resources ensures the various file and input streams are correctly closed when everything is done.

3.2. Using Consumer<Stream<String>>

In the previous example, we benefit from the surrounding try-with-resources to look after our .gz stream resources. However, we may wish to generalize the method for operating on a Stream<String> read from a .gz file on the fly:

try (InputStream inputStream = new FileInputStream(filePath);
     GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
     InputStreamReader inputStreamReader = new InputStreamReader(gzipInputStream);
     BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {

    consumer.accept(bufferedReader.lines());
}

This approach allows the caller to pass in a Consumer<Stream<String>> to operate on the stream of uncompressed lines. Moreover, the code calls accept() on that Consumer to provide the Stream. This allows us to pass in anything we like to operate on the lines:

useContentsOfZipFile(testFilePath, linesStream -> {
  linesStream.filter(line -> line.length() > 10).forEach(line -> count.incrementAndGet());
});

In this example, we’re providing a consumer who counts all of the lines over a certain length.

4. Conclusion

In this short article, we’ve looked at how to read .gz files in Java.

First, we looked at how to read the files into a list using BufferedReader and readLine(). Then, we looked at ways to treat the file as a Stream<String> to process the lines without having to load them all in memory at once.

As always, the implementation of the examples can be found over on GitHub.
Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.