Linux has many tools that we can use for performing search operations in binary files. In this tutorial, we’ll explore several tools, both text-based and GUI-based, to search for a hexadecimal pattern in binary files and compare their performance.
2. Test File Setup
Firstly, let’s create a test file to use throughout this article by using Perl, and name it test.bin:
$ perl -e 'print 0.0.0.1.0.0.0.2.0.1.0.2.0.3.0.4.0.5.0.6.0.7.0.8.0.9.0.10.0.11.0.1' > test.bin
We could use any other method to create a test file, but Perl is pretty straightforward to do this task.
Next, let’s ensure the file was created correctly by dumping its content using hexdump:
$ hexdump -C test.bin 00000000 00 00 00 01 00 00 00 02 00 01 00 02 00 03 00 04 |................| 00000010 00 05 00 06 00 07 00 08 00 09 00 0a 00 0b 00 01 |................| 00000020
We use hexdump to print the content of test.bin in hexadecimal format because the file consists of 32-byte non-printable characters.
Since the file was created successfully, we’ll use it to test against all the tools we’re going to explore.
3. Using Linux Basic Tools
Linux provides many basic tools for searching hexadecimal patterns in binary files, such as grep and bbe. These two packages usually come with the default OS installation. Otherwise, they’re also available for download from many Linux official repositories.
grep is a tool to search and print the lines that match a pattern. Although grep is commonly used to search for printable characters in a file or an input stream, it can also be used to search for hexadecimal patterns in binary files.
Now, let’s say that we want to find a two-byte binary sequence from test.bin, for example, a null character (0x00) and 0x01:
$ grep -obUaP "\x00\x01" test.bin | cat --show-nonprinting 2:^@^A 8:^@^A 30:^@^A
The grep command found three occurrences and then printed the offset for each occurrence.
$ grep -obUaP "\x00\x01" test.bin 2: 8: 30:
To sum up, let’s review the options that we used for grep:
- -o, –only-matching: print only the matched part, not the whole line
- -b, –byte-offset: print the 0-based byte offset
- -U, –binary: treat the file as binary
- -a, –text: process a binary file as if it were text
- -P, –perl-regexp: interpret pattern as Perl-compatible regular expressions
bbe stands for binary block editor, and in this case, it works like the sed command for binary files.
For example, let’s find a two-byte binary sequence (0x00 0x01) in the test.bin file:
$ bbe -b "/\x00\x01/:2" -s -e "F d" -e "p h" -e "A \n" test.bin 2:x00 x01 8:x00 x01 30:x00 x01
Similarly, as we can see from the code above, the bbe command gave the same results as the grep command in the previous section.
Let’s review the options that we used:
- -b, –block=BLOCK: search for a pattern between two forward slashes (/\x00\x01/), and define length of bytes to print (:2), starting from the matched character offset
- -s, –suppress: print only the matched part, similar to ‘grep -o‘
- -e, –expression=COMMAND: commands to execute, similar to ‘sed -e‘
- -e ‘F d’: display offsets before each result (2:…, 8:…, 30:…)
- -e ‘p h’: print results in hexadecimal notation
- -e ‘A \n’: append end-of-line to each result
4. Using bgrep
bgrep is a simple open-source binary grep project written in C. It can print the matched character offset, along with a specified number of bytes before and after the matched character position.
$ curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o $HOME/.local/bin/bgrep - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 8271 100 8271 0 0 7204 0 0:00:01 0:00:01 --:--:-- 7204
bgrep provides three options:
- -A: print n-number of bytes after the occurrence
- -B: print n-number of bytes before the occurrence
- -C: print n-number of bytes before and after the occurrence
Let’s see bgrep in action:
$ bgrep 0001 test.bin test.bin: 00000002 test.bin: 00000008 test.bin: 0000001e
Searching for the two-byte binary sequence (0x00 0x01) in the test.bin file with bgrep gave the same results as the grep and bbe commands.
Furthermore, we can specify the number of bytes to print before and after the matched position by passing the option -C 2:
$ bgrep -C 2 0001 test.bin test.bin: 00000002 \x00\x00\x00\x01 test.bin: 00000008 \x00\x02\x00\x01 test.bin: 0000001e \x00\x0b\x00\x01
Having the extra data printed before or after the matched position could be useful if we need to debug some programs or analyze log files.
5. Using GHex
GHex is a GUI-based binary file editor. It’s available for download from many Linux official repositories.
Let’s install GHex on Debian:
$ sudo apt install ghex
The command above installs the GHex package.
The GUI of GHex is pretty intuitive. To open the test.bin file, we can click the menu File > Open, or press Ctrl + O, select the file test.bin, then click Open:
Similarly, we can search for the two-byte binary sequence (0x00 0x01) by clicking the menu Edit > Find or pressing Ctrl + F, enter the pattern that we want to search, and click Find Next:
All occurrences are highlighted in red, with the offset displayed at the bottom left.
GHex loads the entire file to memory. Consequently, opening a file that exceeds the available memory might cause system performance issues or out-of-memory errors.
6. Using Bless
Bless is a GUI-based binary file editor. It’s available for download from many Linux official repositories.
Here’s how we can install Bless on Debian and its derivatives:
$ sudo apt install bless
The command above installs the Bless package.
Bless has a GUI that is similar to GHex, but it has more advanced features.
Let’s open the test.bin file by clicking the menu File > Open, or pressing Ctrl + O, select the file test.bin, and then click Open:
Then, let’s search for the two-byte binary sequence (0x00 0x01) by clicking the menu Search > Find or pressing Ctrl + F, select the format of the pattern that we want to search, enter the pattern, and click Find Next:
All occurrences are highlighted in blue, with the offset displayed at the bottom.
Unlike GHex, Bless doesn’t load the entire file to the memory and is efficient in handling large data files. It also can do fast find operations with multi-threaded capability.
As we have learned several tools to search for the hexadecimal pattern in binary files, let’s do a simple performance test to see which one is the fastest in finding the pattern.
7.1. Testing Setup
We can do the test on any system, but some tools, like GHex, load the entire file to the memory, so we need to ensure that our system has enough memory before running the test. Otherwise, we could get an out-of-memory error, causing the OS to stop the process or, worse, causing the system to hang or crash.
For this testing, we at Baeldung do the test on the following hardware and data:
- Dell Latitude Intel Core i7-6600U CPU @ 2.60GHz × 4, 16GB RAM
- Harddisk: Seagate external HDD 5TB, 7200rpm, SATA III, ext4
- File: 8.1GB binary file (a VirtualBox Disk Image)
- Pattern to search: 0x03 0xC6 0x42 0x07
- Number of occurrences in the file: 6 occurrences
7.2. Executing the Command
Using the same options for each tool that we have learned in the previous sections, let’s execute the grep, bbe, and bgrep commands:
$ time grep -obUaP "\x03\xC6\x42\x07" /media/baeldung/8gb_binary_file.vdi | cat --show-nonprinting real 1m34.240s user 1m15.694s sys 0m2.873s $ time bbe -b "/\x03\xC6\x42\x07/:4" -s -e "F H" -e "p h" -e "A \n" /media/baeldung/8gb_binary_file.vdi x1ae9a3ad:x03 xc6 x42 x07 x1ee0d869:x03 xc6 x42 x07 xaaf3c5b7:x03 xc6 x42 x07 x1545235b7:x03 xc6 x42 x07 x1717983dd:x03 xc6 x42 x07 x176bdf869:x03 xc6 x42 x07 real 0m59.665s user 0m20.603s sys 0m7.378s $ time bgrep 03C64207 /media/baeldung/8gb_binary_file.vdi /media/baeldung/8gb_binary_file.vdi: 1ae9a3ad /media/baeldung/8gb_binary_file.vdi: 1ee0d869 /media/baeldung/8gb_binary_file.vdi: aaf3c5b7 /media/baeldung/8gb_binary_file.vdi: 1545235b7 /media/baeldung/8gb_binary_file.vdi: 1717983dd /media/baeldung/8gb_binary_file.vdi: 176bdf869 real 0m58.054s user 0m23.646s sys 0m10.434s
For GHex and Bless, since both are GUI-based tools, we need to do the search manually.
After running all the commands and searching the pattern using GHex and Bless manually, we have the following data that we can analyze:
======================================================= | Tool | Found All 6 | Loaded Entire | Elapsed Time | | | Occurrences | File to Memory | | ======================================================= | grep | No (0) | No | 1m34.240s | | bbe | Yes | No | 0m59.665s | | bgrep | Yes | No | 0m58.054s | | GHex | No (2) | Yes | - | | Bless | Yes | No | 4m0.000s | =======================================================
The grep command couldn’t find any occurrences at all, even though it could work on a smaller size file.
On the other hand, both bbe and bgrep could find all six occurrences. Both completed the search in a similar amount of time with a considerably small memory footprint.
Meanwhile, GHex, the GUI-based tool, could only display 2.1GB (0x80000000 bytes) out of 8.1GB of data on its interface. As a result, it was only able to find the first two occurrences.
However, Bless, another GUI-based tool, could load the entire file on its interface and was able to find all six occurrences in the file, which took around four minutes in total. Unlike GHex, Bless doesn’t load the entire file to the memory but a combination of memory and disk cache instead.
In this article, we explored several tools to search for hexadecimal patterns in binary files, both text-based and GUI-based.
We also did a simple performance test for each tool with various results. Text-based tools, like bbe and bgrep, can perform hexadecimal sequence searches on large data files with a considerably small memory footprint. Meanwhile, GUI-based tools, like Bless, can handle large data files efficiently with fast and multi-threaded search operations.