1. Overview

We use the grep command to filter searches in a file for a particular pattern of characters. The text search pattern is called a regular expression. It’s one of the most used Linux commands to display the lines that contain the pattern that we are trying to search.

When we’re using the grep tool we may wish to avoid scanning binary files to save time. This can sometimes relate to certain text files as well as actual real binary files.

In this short tutorial, we’re going to look at how we can use grep and how to exclude binaries from our searches.

2. Why Binary Files Can Be a Problem With grep

There are two cases when grep might think our files are binaries; encoding errors and NUL bytes. Let’s explore them a little bit more.

2.1. Encoding Errors

The grep tool considers a file to be binary if it contains an encoding error according to the C99 mbrlen function. We can see this with an example.  Let’s create a file with a UTF-8 encoding error because \x80 cannot be the first byte of a UTF-8 Unicode point:

$ printf 'Encoding\x80' >> encoding.txt

If we now grep for the matching word “Encoding”:

$ grep "Encoding" encoding.txt
Binary file encoding.txt matches

We see that grep interprets the file encoding.txt as a binary file when it is only a text file with an encoding error.

2.2. NUL Bytes

The grep tool will scan buffers trying to read NUL bytes, but it also attempts to see if it can determine that a file must have NULs in the remaining data. Holes are unwritten data and Unix mandates that they read as NUL bytes, so if a file contains a hole, it contains a NUL, and grep will consider our file to be a binary. Let’s see a very simple example where a text file contains a NUL byte:

$ printf "File with NUL byte\0" >> nul.txt

Let’s now use the grep command in this file:

$ grep "NUL" nul.txt
Binary file nul.txt matches

We can see that also in this case, grep thinks this is a binary file instead of just a plain text file with a NUL byte.

3. The grep Command with Binary Files

When we try to find all files that contain a certain string value, it can be very costly to check binary files that we might not want to check. On some occasions, binary files can be very large and we would be wasting time and resources scanning through them. Let’s look at an example where we would not want to look inside a binary file.

3.1. Using grep Without Suppressing Binary Files

Let’s suppose we want to search for the text “printHello” among all our files. This word corresponds to a defined C function “void printHello” and is used multiple times in our project, however, we would like to know where and how. We can now generate the text file (hello.c):

$ cat <<EOF >>hello.c
#include <stdio.h>
#include <stdlib.h>
void printHello(){printf ("Hello World\n");}
int main() {
    printHello();
    return 0;
}
EOF

Let’s now compile hello.c and generate the binary file (out.x):

$ gcc hello.c -o out.x

To generate the file out.x we are using GCC, the C compiler present on most Linux distributions.So, let’s now grep “printHello” throughout all our files:

$ grep "printHello"
hello.c:void printHello(){printf ("Hello World\n");}
hello.c:printHello();
Binary file out.x matches

The grep output indicates that “printHello” was found in the hello.c file. However, it’s also found in the binary file.

3.2. Using grep Suppressing Binary Files

We would prefer to see only the text files which contain code so let’s now use grep to skip binary files:

$ grep -I "printHello" *
hello.c:void printHello(){printf ("Hello World\n");} 
hello.c:printHello();

Here we used the -I parameter and we could also use -binary-files=without-match. These are the grep options to skip over binary files. This is exactly what we were looking for. We now have all the matches from the text file but not from the binary file.

4. Conclusion

In this article, we saw how the grep tool understands binary files. We also saw in which cases our non-binary files can still be interpreted as binary by grep.

Finally, we learned a simple command that will help us to scan through text files while suppressing the binary files.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.