Getting the CRC of a File in Linux

1. Overview

Cyclic Redundancy Check (CRC) is a widely used mechanism in error detection. For example, the frame check sequence field in the 802.3 Ethernet frame is a 32-bit CRC.

Besides detecting errors during data transmission, we can also use CRCs to check for data corruption in storage systems. For example, we can verify the integrity of a file by checking its CRC.

In this tutorial, we’ll discuss how to get the CRC of a file. Firstly, we’ll have a brief introduction to CRCs. Then, we’ll learn about the methods to get the CRC of a file in Linux.

2. What Is a CRC?

Cyclic codes, which form a subclass of linear block codes, are widely used in error correction since they have a mathematical structure that leads to very efficient encoding and decoding schemes.

CRC codes are cyclic codes used only for error detection. An n-bit CRC code, namely CRC-n, adds n redundant bits to a message. These n bits form the CRC or parity bits and are calculated using the message and a binary generator polynomial specific to the CRC code. The coefficients of a binary generator polynomial are either 0 or 1.

A typical encoder first pads the k-bit message block with n zeros and then generates the n-bit CRC by dividing the padded message block with the generator polynomial. The remainder of the polynomial division is the CRC. The k-bit message block, together with the appended n-bit CRC, forms a code word of length n+k.

The decoder divides the received code word with the generator polynomial. If any bit of the n-bit remainder is 1, then there’s corruption in the received code word. CRCs can detect many error patterns, for example, all error bursts of length n or less.

Alternatively, the decoder might divide the k-bit message block in the received code word with the generator polynomial after padding the message block with n zeros. If the n-bit remainder of the division isn’t the same as the CRC in the received code word, then the received code word is corrupted.

The message in our examples will be a file. The tools and modules we’ll inspect in the subsequent sections calculate 32-bit CRCs, i.e., CRC-32.

3. Generating the Input File

We generate the input file we’ll use in the subsequent sections using the echo command:

$ echo -n 123456789 > test_file

The name of the input file is test_file. The -n option of echo doesn’t add a trailing newline character.

Let’s check the contents of the file using hexdump:

$ hexdump -c test_file 
0000000   1   2   3   4   5   6   7   8   9                        
0000009

test_file doesn’t have a newline character at the end, as expected.

4. Using crc32

We can use the crc32 utility to check the CRC-32 of a file. crc32 is an executable Perl script that comes installed with the libarchive-zip-perl package.

Let’s compute the CRC-32 of test_file using crc32:

$ crc32 test_file
cbf43926

We just pass the file to crc32 as an argument. It’s possible to pass multiple files to crc32. It displays the result in hexadecimal format. Obviously, the result, cbf43926, is 32 bits.

5. Using Perl

We can write our own Perl script to compute the CRC-32 of a file using Perl’s String::CRC32 interface. The libstring-crc32-perl package must be installed to use this interface.

We’ll use the Perl script, cksum.pl, to compute the CRC-32:

$ cat cksum.pl
#!/usr/bin/perl
use String::CRC32;
 
open my $fd, '<', $ARGV[0] or die $!;
$crc = crc32($fd);
printf "%x\n", $crc;
 
close $fd;

We import the String::CRC32 interface using the use String::CRC32 statement.

The script expects to get the file name from the command line using $ARGV[0]. We open the file using the open my $fd, ‘<‘, $ARGV[0] statement. We exit from the script using the die $! statement if there’s a problem opening the file.

Then, we compute the CRC-32 of the file using the $crc = crc32($fd) statement. The printf “%x\n”, $crc statement prints the CRC-32 in hexadecimal format. Finally, we close the file using close $fd.

Let’s test the script:

$ ./cksum.pl test_file
cbf43926

The computed CRC is the same as the one computed by crc32, as expected.

6. Using Python

We can also use Python to compute the CRC-32 of a file. We’ll compute the CRC-32 twice using two Python modules, namely binascii, and zlib, in cksum.py:

$ cat cksum.py
#!/usr/bin/python3
import sys, binascii, zlib
 
filename = sys.argv[1]
file = open(filename, 'rb')
buf = file.read()
 
cksum_binascii = binascii.crc32(buf)
print(hex(cksum_binascii))
 
cksum_zlib = zlib.crc32(buf)
print(hex(cksum_zlib))
 
file.close()

We import the two modules together with the sys module using the import sys, binascii, zlib statement.

Then, we get the file name using filename = sys.argv[1]. The file = open(filename, ‘rb’) statement opens the file. We read the content of the file using buf = file.read().

Having read the content of the file, we first compute the CRC-32 using the crc32() method of the binascii module, i.e., cksum_binascii = binascii.crc32(buf). Then, we compute the CRC-32 using the crc32() method of the zlib module, i.e., cksum_zlib = zlib.crc32(buf).

The print(hex(cksum_binascii)) and print(hex(cksum_zlib)) statements print the results in hexadecimal format.

Finally, we close the file using file.close().

Let’s run the script:

$ ./cksum.py test_file 
0xcbf43926
0xcbf43926

The two CRCs are the same, as expected, and they’re also the same as the one computed by crc32.

7. Using cksum

Another alternative is to use the cksum command to compute the CRC-32 of a file. The cksum command not only computes the CRC-32 but also the length of the file in bytes:

$ cksum test_file
930766865 9 test_file

The cksum command displays the result in decimal format. 930766865 in the output is the CRC-32 of test_file, and 9 is the length of the file in bytes.

Let’s print the CRC-32 in hexadecimal format using the printf command:

$ printf "%x\n" 930766865
377a6011

The value of the CRC-32 in hexadecimal format is 377a6011, which is different from the results in the previous sections. The implementation of cksum appends the length of the input to the input while calculating the CRC. This implementation is known as CRC-32/CKSUM or CRC-32/POSIX.

8. Conclusion

In this article, we discussed how to get the CRC of a file. First, we learned that CRC is a special cyclic code used for error detection.

Then, we saw how to calculate the CRC of a file using crc32, Perl, Python, and cksum. We learned that the CRC calculated by cksum is different from the others as it includes the data length in the calculation.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security