1. Introduction

In this article, we’ll discuss parallel file archiving and compression in Linux symmetric multiprocessing systems. Additionally, a good comprehension of files and filesystems is needed to understand this article better.

2. What Is Symmetric Multiprocessing (SMP)

In the domain of computer multiprocessing, symmetric multiprocessing systems are those that are made to work with multiple processors that share the same memory and operating system. This means that each processor essentially shares the same resources.

The advantage of this architecture is that workloads can be balanced across machines. This means that for whatever process currently running, the data and resources of each machine in the network can be accessed independently of the central data path. Most modern operating systems support SMP. However, there is no point in using SMP unless whatever applications we choose to run are optimized for multi-threading.

Alternatively, other architectures exist. These options include:

  • Massively Parallel Processing (MPP) systems that have processors that don’t share resources and can provide broader scalability than SMP systems, allowing each processor with its own OS and memory to process data in parallel.
  • Asymmetric Multiprocessing (AMP) systems that don’t treat all processors similarly and could, in theory, rely on only one processor to run the operating system, for example:
Symmetric MultiProcessing

 

Popular SMP applications for servers include SQL databases, FTP storage, Plex streaming, and other uses compatible with software multi-threading. Most operating systems support SMP.

3. Parallel File Operations on File Systems

Because multiprocessing (MP) systems can leverage multi-threading, compression and archiving files can be split over many disks, and this can cause a greater bottleneck to the point where the speed of the compression or archiving tools that we’re using is far slower than the speed of the information bus that the disks are connected to.

Data can be fragmented across multiple devices. Therefore we must employ different mechanics to store and compress files in MP systems.

3.1. Archiving

It is important to note that the Linux kernel natively supports SMP. Therefore, operating systems like Ubuntu can leverage a variety of file systems and packages that use multithreading.

Reddit’s data hoarding community exposes the different options when it comes to file server configurations for archiving. Specifically for parallel archiving and network-accessible storage in Linux file systems, there exist many different solutions. We can review some of the most popular options:

  • BTRFS, a filesystem, and logical volume manager built by Oracle
  • ZFS is available for Ubuntu as well as for other distributions. However, Linus Torvalds disapproves of this project for licensing issues.
  • Unraid, a proprietary alternative, offers game server hosting, server data monitoring, container support, and more
  • FreeNAS, a network storage system based on FreeBSD

These are only a few of the available options for setting up a network file system (NFS).

3.2. Compressing

For compression that leverages all the processing cores in multi-threaded systems, we can use the following applications:

Although a detailed analysis of different compression algorithms can be made, it is outside the scope of this article.

For comparison, a benchmark test was conducted amongst some of these options revealing the following results for a text file of 70KB:

Method File Size % of Original
PBZIP2 16.1 KB 23%
PXZ 15.4 KB 22%
PLZIP 15.5 KB 22.1%
LRZIP 15.3 KB 21.8%
PIGZ 17.4 KB 24.8%

We can review other tables for a speed comparison of these different algorithms.

4. Conclusion

In this article, we saw the different aspects of parallel archiving and compressing of files in Linux.

Comments are closed on this article!