The Linux operating system treats everything like a file. Therefore, any system administrator must understand how to work with files, including compressed files.
There are several reasons why it’s important that we know how to distinguish a compressed file from an uncompressed file. For one, a compressed file occupies less disk space. That means it’s faster to transfer across the network compared to its uncompressed counterpart, thus improving our system’s performance.
Also, there are instances where we can’t send a file whose size exceeds a certain size limit. In this case, we can compress our file or use a compressed version. For the option of using a compressed version, we’ll have to know how to easily identify if the file is compressed or not.
In this tutorial, we’ll discuss different ways to check whether a file is compressed. First, we’ll identify the file extension used, then reinforce this knowledge with the file command, and finally, we’ll use the gzip and bzip2 command-line utilities to check our files to see if they’re compressed.
2. Examining the File Extension
Normally, we categorize files based on their extension. This applies to compressed files, too.
Before we proceed, let’s note that there are several compression methods used to compress files, and each uses different file extensions. We can use this knowledge to identify compressed files and also the compression method used:
- .zip – represents a file compressed using the zip compression method
- .gz – represents a file compressed using the gzip utility
- .bz2 – an extension identifying a file compressed with the bzip2 tool
- .tar.gz – an extension for a file compressed with both the tar and gzip compression methods
This is not an exhaustive list of compression file extensions, but what we’ve covered is enough to give us an idea of how to use this approach.
However, there are situations where we might encounter files that don’t have any extension or whose extension doesn’t reflect their actual format. What can we do to address this problem? Let’s look at the next strategy.
3. Using the file Command
In Linux, the name of a file does not reflect the actual format of the file. This means that we can choose to name a file with the .zip extension even if its format dictates it contains the .gz extension. With this in mind, it becomes difficult to confidently recognize the type of file. This is where the file command comes into play.
The file command determines the actual file type while ignoring the file extension defined and prints a brief description of the file. This output displays evidence of whether a file has been compressed or not. So, if the file is compressed, it’ll also show the type of compression method used.
Let’s look at the general syntax of the file command:
$ file [option] [filename]
Now, let’s put it to use. First, let’s work with an uncompressed file:
$ file testing.txt testing.txt: ASCII text
Above, we’ve learned that the testing.txt file is a text file that uses the ASCII encoding format.
Next, let’s work with a compressed file:
$ file testing.txt.bz2 testing.txt.bz2: bzip2 compressed data, block size = 900k
In the example above, the output shows that the compression method used was bzip2.
Further, we can work with another compressed file:
$ file testing.txt.gz testing.txt.gz: gzip compressed data, was "testing.txt", last modified: Thu Feb 16 12:36:21 2023
Here, the output shows that the gzip compression method was used.
This approach also works for files compressed with the other compression methods available.
4. Using the gzip and bzip2 Commands
As we all know by now, compressed files have different file extensions depending on the compression method used. In this section, we’ll use the gzip tool to prove whether a file was compressed with the gzip method. Similarly, we’ll use the bzip2 tool to determine whether a file was compressed with the bzip2 method.
4.1. The gzip Command
gzip is a command-line tool that allows us to compress and decompress files. We use it to compress files to save space, reduce download time, and speed up data transfer.
We’ll use gzip with the -l option:
$ gzip -l [filename]
The -l option instructs gzip to display information about the compressed file.
So, let’s check out the testing.txt.gz file:
$ gzip -l testing.txt.gz compressed uncompressed ratio uncompressed_name 389 692 47.7% testing.txt
Above, we see that the gzip command tells us the compressed size, uncompressed size, compression ratio, and name of the original file, respectively. This output proves that the file was compressed with gzip.
On the other hand, if a different algorithm compressed this file, or if the file was not compressed at all, the gzip command would display an error message.
4.2. The bzip2 Command
bzip2 is a free and open-source compression utility used to compress and decompress files in Linux.
We’ll use the bzip2 command with a combination of the -t and -v options:
$ bzip2 -tv filename
The -t option instructs bzip2 to test the integrity of the compressed file, while the -v option instructs this utility to enable verbose output.
So, let’s find out whether testing.txt.bz2 is a compressed file:
$ bzip2 -tv testing.txt.bz2 testing.txt.bz2: ok
This output proves that testing.txt.bz2 was compressed using the bzip2 tool. Similar to the gzip command, we’ll get an error if the file wasn’t compressed with bzip2.
In this article, we discussed different approaches to checking whether a file on our Linux system is compressed. First, we took a close look at the file extension, then we used the file command, and finally, we looked at the gzip and bzip2 commands and some of the options these commands provide that can help us with the task at hand.