Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: December 3, 2023
In this article, we’ll discuss parallel file archiving and compression in Linux symmetric multiprocessing systems. Additionally, a good comprehension of files and filesystems is needed to understand this article better.
In the domain of computer multiprocessing, symmetric multiprocessing systems are those that are made to work with multiple processors that share the same memory and operating system. This means that each processor essentially shares the same resources.
The advantage of this architecture is that workloads can be balanced across machines. This means that for whatever process currently running, the data and resources of each machine in the network can be accessed independently of the central data path. Most modern operating systems support SMP. However, there is no point in using SMP unless whatever applications we choose to run are optimized for multi-threading.
Alternatively, other architectures exist. These options include:
Popular SMP applications for servers include SQL databases, FTP storage, Plex streaming, and other uses compatible with software multi-threading. Most operating systems support SMP.
Because multiprocessing (MP) systems can leverage multi-threading, compression and archiving files can be split over many disks, and this can cause a greater bottleneck to the point where the speed of the compression or archiving tools that we’re using is far slower than the speed of the information bus that the disks are connected to.
Data can be fragmented across multiple devices. Therefore we must employ different mechanics to store and compress files in MP systems.
It is important to note that the Linux kernel natively supports SMP. Therefore, operating systems like Ubuntu can leverage a variety of file systems and packages that use multithreading.
Reddit’s data hoarding community exposes the different options when it comes to file server configurations for archiving. Specifically for parallel archiving and network-accessible storage in Linux file systems, there exist many different solutions. We can review some of the most popular options:
These are only a few of the available options for setting up a network file system (NFS).
For compression that leverages all the processing cores in multi-threaded systems, we can use the following applications:
Although a detailed analysis of different compression algorithms can be made, it is outside the scope of this article.
For comparison, a benchmark test was conducted amongst some of these options revealing the following results for a text file of 70KB:
| Method | File Size | % of Original |
|---|---|---|
| PBZIP2 | 16.1 KB | 23% |
| PXZ | 15.4 KB | 22% |
| PLZIP | 15.5 KB | 22.1% |
| LRZIP | 15.3 KB | 21.8% |
| PIGZ | 17.4 KB | 24.8% |
We can review other tables for a speed comparison of these different algorithms.
In this article, we saw the different aspects of parallel archiving and compressing of files in Linux.