2. Why Use xz?
We know that Zip is the standard cross-platform archiving tool. gzip with tar is the standard archiving tool in Linux. So, why use xz at all?
xz creates much smaller archives than gzip while using the same options as gzip. Therefore, we can consider xz a better drop-in replacement for gzip. We’ll test the claim of smaller archives in a later section.
The disadvantage of xz is that it doesn’t ship with all Linux distributions. But we can install it with either yum or apt.
3. Using xz for Single Files
Let’s use xz to compress a single file. Except for the program name, the usage is identical to gzip:
xz -v data.csv
This command compresses the file data.csv and replaces it with the file data.csv.xz. The -v option lets xz display progress information.
xz has the same compression levels 1-9 as gzip. The default compression is 6. But unlike gzip, that default compression level may not be a good compromise between speed and compression ratio as we’ll see in a later section. So here’s how we compress a file with the minimum compression level 1:
xz -v1 data.csv
Unlike gzip, there is no separate program for decompressing a file. Instead, we use the -d option to decompress a single file:
xz -dv data.csv.xz
This decompresses the file data.csv.xz and replaces it with data.csv. The -v option also displays progress information here.
4. Using tar With xz for Multiple Files and Directories
Just like with gzip, xz compresses only a single file. That’s why we also have to use xz together with the tar archiving utility to compress multiple files or entire directories:
tar cJvf archive.tar.xz *.csv
- We compress all files with a csv extension in the current directory into the compressed archive, archive.tar.xz
- The J option enables compression with xz
- Because of the v option, tar shows which files are added to the archive
- Unlike xz and gzip, tar doesn’t delete the input files after it creates the archive
Which xz compression level does tar pick? It depends on our version of tar, but it probably is the default compression level 6. How can we change that?
tar allows setting the compression program through the –use-compress-program option. We use this option to set the compression level, too. Here, we specify the minimum compression level 1:
tar cvf archive.tar.xz --use-compress-program='xz -1' *.csv
Please note that we removed the J option here because –use-compress-program already sets the compression program.
Decompressing a tar archive with xz is also a single step and identical to gzip (except for the different file extension):
tar xvf archive.tar.xz
- We decompress the file archive.tar.xz and extract its content into the current directory
- We don’t have to tell tar to decompress with xz. tar does this automatically by inspecting the file and detecting the xz compression
- Because of the v option, tar shows which files are extracted from the archive
- Unlike xz, tar doesn’t delete the archive file after the extraction is complete
5. Faster Compression With Multithreading
Unlike gzip, xz supports multithreading directly, which speeds up compression.
By default, xz uses just a single thread. We can specify the number of threads with the -T option. A value of 0 tells xz to use one thread for every available CPU core. That’s generally a good default value to use:
xz -vT0 data.csv
And here, we’ll use three threads:
xz -vT3 data.csv
Unlike unpigz, decompression with xz doesn’t benefit from multithreading.
6. Using Multithreading With tar
Previously, we specified the compression level with the –use-compress-program option. Now, we enable multithreading through the same –use-compress-program option by setting the number of threads. Here, we use one thread for every CPU core again:
tar cvf archive.tar.xz --use-compress-program='xz -1T0' *.csv
Because decompression with xz doesn’t benefit from multithreading, we don’t need to set particular options for decompressing xz archives with tar.
7. Testing Archive Sizes With xz
Previously, we stated that xz creates smaller archives than gzip. To test this claim, we used the same 818 MB CSV file, and the same computer with six CPU cores and hyperthreading, as we used to test gzip in Linux.
We compared xz to pigz, a gzip implementation that uses multithreading for faster compression and decompression.
- Both archiving tools saturated the CPU in our tests. pigz does this by default, xz because of the -T0 option
- At compression level 7 out of 9, pigz compressed the 818 MB CSV file down to 95 MB in 4 seconds. Higher compression levels didn’t produce meaningfully smaller archives
- In the same 4 seconds, xz compressed the file to just 48 MB, which was 49% smaller than pigz. xz used compression level 1 out of 9 for this
- With compression level 5, xz produced the smallest archive at 29 MB, which was 69% smaller than pigz. But at 70 seconds, xz also took nearly 18 times as long! Compression levels six and beyond hugely increased the compression time for a negligible 1% reduction in archive size
So we’ve demonstrated that xz does indeed create much smaller archives than gzip.
In this short article, we first saw when we might choose xz over Zip and gzip. We then learned how to compress and decompress single files with xz.
Next, we looked at how we can use tar with xz to compress and decompress multiple files and directories.
And finally, we discovered how multithreading speeds up the compression on modern computers.