Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
Previously, we’ve looked at Zip and 7-Zip in Linux. In this short tutorial, we focus on gzip and gunzip for compressing and uncompressing files from the Linux command line.
Zip has two advantages over gzip:
So, if Zip has these advantages, why would we use gzip in Linux then? One word: ubiquity. No matter which Linux distribution we use, tar and gzip are always installed. In Linux, Zip is two different programs: zip and unzip. We can’t rely on either to be installed.
Granted, we could easily install zip and unzip on most Linux systems with yum and apt. But in the age of running our Spring Boot applications in Docker containers, we want to keep our Docker images small. And that means installing as little additional software as possible.
Let’s use gzip to compress a single file:
gzip -v data.csv
This compresses the file data.csv and replaces it with the file data.csv.gz. The -v option lets gzip display the compression ratio.
gzip has compression levels 1-9, where 9 gives us maximum compression but at the slowest speed. The default compression level is 6 and is a good compromise between speed and compression ratio.
Using higher levels of compressions significantly increases compression time, but often with only a slight increase, if any, in the compression ratio.
Here’s how we compress a file with maximum compression level:
gzip -v9 data.csv
Now, let’s use gunzip to decompress a single file from a gzip file:
gunzip -v data.csv.gz
This decompresses the file data.csv.gz and replaces it with data.csv.
As with gzip, the -v option shows the compression ratio after the file was uncompressed.
gzip compresses just a single file. That’s why we have to use gzip together with the tar archiving utility to compress multiple files or entire directories. We can archive with tar and compress with gzip in one step:
tar czvf archive.tar.gz *.csv
As we recall from the previous section, gzip offers various compression levels. Which compression level does tar pick? It depends on our version of tar, but it probably is the default compression level 6.
tar allows setting the compression program through the –use-compress-program option. We use this option to also set the compression level. Here, we specify the maximum gzip compression level of 9:
tar cvf archive.tar.gz --use-compress-program='gzip -9' *.csv
Please note that we had to remove the z option here because –use-compress-program already sets the compression program.
Uncompressing a tar archive with gzip is also a single step:
tar xvf archive.tar.gz
gzip and gunzip, like most Linux tools, only use a single CPU core. So, compressing large files can take a while.
That’s why pigz exists, a “parallel implementation of gzip”. pigz takes advantage of both multiple CPUs and multiple CPU cores for higher compression and decompression speed. pigz is an anagram of gzip and is pronounced “pig-zee”. We can install it with either yum or apt.
pigz is compatible with gzip, and unpigz is compatible with gunzip. As such, pigz produces files that gunzip can decompress and uses the same options as gzip. Likewise, unpigz decompresses files that gzip created and also uses the same options as gunzip.
How much faster is pigz?
To find out, we ran a quick test on a modern computer with six CPU cores and hyperthreading. The test data was an 818 MB CSV file. We used the maximum compression level 9 with both gzip and pigz.
First, we compressed a file with pigz:
pigz -v9 data.csv
And then, we decompressed this file using unpigz:
unpigz -v data.csv.gz
So, pigz/unpigz does indeed speed up compressing and decompressing files significantly with multiple CPUs or multiple CPU cores!
To use pigz together with tar, we specify –use-compress-program to compress with pigz:
tar cvf archive.tar.gz --use-compress-program=pigz *.csv
We cannot specify the decompression program when extracting a compressed archive with tar. That’s why we have to perform two separate steps if we want to use unpigz for decompression:
unpigz -v archive.tar.gz
tar xvf archive.tar
In this short article, we first saw when we might choose gzip over Zip. We then learned how to compress and decompress single files with gzip/gunzip.
Next, we looked at how we can use tar with gzip to compress and decompress multiple files and directories.
And finally, we discovered how pigz speeds up the compression and decompression on modern computers.