Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: November 7, 2024
It’s good practice to back up important files regularly. However, this simple task becomes challenging and time-consuming when the data size increases over time. One of the easiest yet efficient methods to overcome this limitation is using compression to reduce the data size.
Linux provides many utilities for compressing data. In this short tutorial, we’ll combine the tar and xz commands to achieve maximum compression. So, let’s get started.
To begin, let’s compress a directory using the default compression level.
First, create a directory using the mkdir command and use a dd command to create a file of 1.1 GB:
$ mkdir compression-demo
$ dd if=/dev/zero of=compression-demo/file-1.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.01976 s, 1.1 GB/s
$ du -sh compression-demo
1.1G compression-demo
Here, the du command shows that the directory size is 1.1 GB.
Next, let’s compress this directory using the combination of the tar and xz algorithm:
$ tar cvfJ compression-demo.tar.xz compression-demo
compression-demo/
compression-demo/file-1.img
In this example:
Now, let’s see the size of compression-demo.tar.xz file:
$ du -sh compression-demo.tar.xz
156K compression-demo.tar.xz
Here, we can see that the default compression level has reduced the file size significantly.
The xz command’s default compression level is 6, which provides a good compression ratio with minimal memory. This level is ideal for legacy systems.
In the next section, we’ll understand more about the xz command’s compression levels.
In the previous section, we compressed a directory using the default compression level. However, the xz format allows us to adjust the compression level as needed. So let’s understand more about them.
In the xz command, the compression levels are specified by a single-digit number. The number ranges from 0 to 9, the number 0 represents the least compression, whereas 9 represents the maximum compression.
An important point to remember is that the compression levels represent a trade-off between the compression speed and the size of the compressed output. So, with the lower compression level, we get the better compression speed but the larger compressed file size and vice-versa.
Now, let’s understand how to specify the compression levels.
The -I option of the tar command allows us to select the compression engine. We can use this option with the xz command to use the appropriate compression level.
To understand this, let’s compress the directory using the compression level 8:
$ tar cvf compression-demo-level-8.tar.xz -I 'xz -8' compression-demo
It’s important to note that in this example, the tar command doesn’t use the J option as the compression engine is specified by the -I option.
Alternatively, we can also set the compression level using the XZ_OPT environment variable. For example, we can achieve the same result as the previous example using:
$ XZ_OPT=-8 tar cvfJ compression-demo-env-var.tar.xz compression-demo
In the previous section, we discussed higher compression levels to achieve a better compression ratio. However, one of the limitations of the higher compression levels is that it takes more time to compress the data.
So, let’s discuss a few methods that allow us to perform the compression efficiently.
Nowadays, all computers have multiple cores and we can utilize these cores to speed up the compression without compromising the compression efficiency.
To understand this, let’s use the -T option of the xz command to enable the multi-threaded compression:
$ tar cvf compression-demo-mt.tar.xz -I 'xz -9 -T2' compression-demo
In this example, the -T2 option instructs the xz command to use the two CPU cores.
Additionally, the xz command allows us to enable the multi-threaded mode using the XZ_DEFAULTS environment variable.
For example, we can instruct the xz command to use the two threads using:
$ XZ_DEFAULTS='-T=2' tar cvfJ compression-demo-mt.tar.xz compression-demo
Similarly, we can also specify the special option -T0 to instruct the xz command to use all available CPU cores of the system.
Yet another method to improve compression efficiency is using a larger dictionary size.
By default, the xz command uses a dictionary of size 8 MB. However, we can modify it as per our needs.
So, let’s see how to use a non-default dictionary size using a simple example:
$ tar cvf compression-demo-dict.tar.xz -I 'xz -9 --lzma2=dict=256M' compression-demo
In this example, we have used the –lzma2=dict=256M option to specify the dictionary size of 256 MB.
It’s worth noting that though large dictionary sizes improve compression efficiency, they also increase the underlying system’s memory usage.
In addition to the above two methods, we can enable the extreme compression level using the -9e option of the xz command:
$ tar cvf compression-demo-extreme.tar.xz -I 'xz -9e' compression-demo
The good thing about this method is that it tries to provide the best compression ratio. However, one minor limitation is that it takes more time for compression if the data size is larger.
In this short article, we discussed compressing a directory using tar and xz commands.
First, we compressed a directory using the default compression level.
Then, we discussed two methods to modify the default compression level.
Finally, we discussed using multi-threading, a larger dictionary size, and an extreme compression level to perform compression efficiently.