Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

It’s good practice to back up important files regularly. However, this simple task becomes challenging and time-consuming when the data size increases over time. One of the easiest yet efficient methods to overcome this limitation is using compression to reduce the data size.

Linux provides many utilities for compressing data. In this short tutorial, we’ll combine the tar and xz commands to achieve maximum compression.  So, let’s get started.

2. Compress a Directory With the Default Compression Level

To begin, let’s compress a directory using the default compression level.

First, create a directory using the mkdir command and use a dd command to create a file of 1.1 GB:

$ mkdir compression-demo

$ dd if=/dev/zero of=compression-demo/file-1.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.01976 s, 1.1 GB/s

$ du -sh compression-demo 
1.1G	compression-demo

Here, the du command shows that the directory size is 1.1 GB.

Next, let’s compress this directory using the combination of the tar and xz algorithm:

$ tar cvfJ compression-demo.tar.xz compression-demo 
compression-demo/
compression-demo/file-1.img

In this example:

  • c: instructs the tar command to create a new archive
  • v: enables the verbose mode
  • f: treats input file as a regular file, and
  • J: sets the xz compression algorithm

Now, let’s see the size of compression-demo.tar.xz file:

$ du -sh compression-demo.tar.xz 
156K	compression-demo.tar.xz

Here, we can see that the default compression level has reduced the file size significantly.

The xz command’s default compression level is 6, which provides a good compression ratio with minimal memory. This level is ideal for legacy systems.

In the next section, we’ll understand more about the xz command’s compression levels.

3. Understanding the Compression Levels

In the previous section, we compressed a directory using the default compression level. However, the xz format allows us to adjust the compression level as needed. So let’s understand more about them.

In the xz command, the compression levels are specified by a single-digit number. The number ranges from 0 to 9, the number 0 represents the least compression, whereas 9 represents the maximum compression.

An important point to remember is that the compression levels represent a trade-off between the compression speed and the size of the compressed output. So, with the lower compression level, we get the better compression speed but the larger compressed file size and vice-versa.

Now, let’s understand how to specify the compression levels.

3.1. Using the xz and -I Options of the tar Command

The -I option of the tar command allows us to select the compression engine. We can use this option with the xz command to use the appropriate compression level.

To understand this, let’s compress the directory using the compression level 8:

$ tar cvf compression-demo-level-8.tar.xz -I 'xz -8' compression-demo

It’s important to note that in this example, the tar command doesn’t use the J option as the compression engine is specified by the -I option.

3.2. Using the XZ_OPT Environment Variable

Alternatively, we can also set the compression level using the XZ_OPT environment variable. For example, we can achieve the same result as the previous example using:

$ XZ_OPT=-8 tar cvfJ compression-demo-env-var.tar.xz compression-demo

4. Doing Compression Efficiently

In the previous section, we discussed higher compression levels to achieve a better compression ratio. However, one of the limitations of the higher compression levels is that it takes more time to compress the data.

So, let’s discuss a few methods that allow us to perform the compression efficiently.

4.1. Using Multiple Threads

Nowadays, all computers have multiple cores and we can utilize these cores to speed up the compression without compromising the compression efficiency.

To understand this, let’s use the -T option of the xz command to enable the multi-threaded compression:

$ tar cvf compression-demo-mt.tar.xz -I 'xz -9 -T2' compression-demo

In this example, the -T2 option instructs the xz command to use the two CPU cores.

Additionally, the xz command allows us to enable the multi-threaded mode using the XZ_DEFAULTS environment variable.

For example, we can instruct the xz command to use the two threads using:

$ XZ_DEFAULTS='-T=2' tar cvfJ compression-demo-mt.tar.xz compression-demo

Similarly, we can also specify the special option -T0 to instruct the xz command to use all available CPU cores of the system.

4.2. Using a Larger Dictionary Size

Yet another method to improve compression efficiency is using a larger dictionary size.

By default, the xz command uses a dictionary of size 8 MB. However, we can modify it as per our needs.

So, let’s see how to use a non-default dictionary size using a simple example:

$ tar cvf compression-demo-dict.tar.xz -I 'xz -9 --lzma2=dict=256M' compression-demo

In this example, we have used the –lzma2=dict=256M option to specify the dictionary size of 256 MB.

It’s worth noting that though large dictionary sizes improve compression efficiency, they also increase the underlying system’s memory usage.

4.3. Using Extreme Compression Level

In addition to the above two methods, we can enable the extreme compression level using the -9e option of the xz command:

$ tar cvf compression-demo-extreme.tar.xz -I 'xz -9e' compression-demo

The good thing about this method is that it tries to provide the best compression ratio. However, one minor limitation is that it takes more time for compression if the data size is larger.

5. Conclusion

In this short article, we discussed compressing a directory using tar and xz commands.

First, we compressed a directory using the default compression level.

Then, we discussed two methods to modify the default compression level.

Finally, we discussed using multi-threading, a larger dictionary size, and an extreme compression level to perform compression efficiently.