Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
Regular expressions (Regex) are widely used in the Linux command line. Many common commands support Regex, such as grep, sed, and awk.
Some of us may have encountered a case where a particular Regex doesn’t work with Linux commands – for instance, a pattern containing \d – however, the same Regex works well with Java or Python. This may confuse us.
In this tutorial, let’s take a closer look at this sort of problem and explain why it can happen.
As usual, let’s understand the problem through an example. First, let’s create a text file as our input:
$ cat input.txt
Linux is awesome!
This server is running the Linux kernel 5.16.5-arch1-1.
It has many powerful commands.
The input.txt file contains three lines.
We know the Regex [0-9] matches one single digit. So, the command grep ‘[0-9]’ input.txt should match the second line in the input.txt file:
$ grep '[0-9]' input.txt
This server is running the Linux kernel 5.16.5-arch1-1.
Further, we may have learned that “\d is the short form of [0-9].” So, let’s replace the Regex in the grep command with\d and try again:
$ grep '\d' input.txt
It has many powerful commands.
As the output above shows, it seems that grep doesn’t recognize\d as [0-9]. Instead, it treats \d as a literal letter ‘d‘. Therefore, only the last line is matched.
If we test the same Regex with sed or awk, we can get the same result:
$ sed -n '/\d/p' input.txt
It has many powerful commands.
$ awk '/\d/' input.txt
awk: cmd. line:1: warning: regexp escape sequence `\d' is not a known regexp operator
It has many powerful commands.
Moreover, the awk command explicitly throws a warning message saying that ‘\d’ is unknown.
However, we can get the expected output if we test the same Regex and the input file in Java, Python, or PHP.
So, why isn’t \d supported by Linux commands? Next, let’s figure it out.
To answer the question, we should understand the different Regex flavors. There are three commonly used Regex syntaxes — BRE, ERE, and PCRE:
BRE came earliest. It has limited features and expressiveness. Then, BRE was extended to ERE. Later, PCRE joined the Regex party with a rich set of powerful features.
We won’t dive into each Regex syntax and make this a complete Regex tutorial. Instead, we’ll discuss some differences between BRE, ERE, and PCRE through some examples.
As we’ve mentioned earlier, BRE is the oldest Regex syntax. As its name implies, it supports only pretty basic features. For instance, the following features are not supported by the standard POSIX BRE:
Also, we need to escape “{m, n}” (possessive quantifiers) and “(…)” (grouping) to give them special meaning. For example, “[0-9]\{2,4\}” matches two, three, or four digits.
After ERE was introduced, most Regex engines, such as GNU BRE, supported some shorthand such as ‘\s‘ in BRE. Further, |, ?, and + are supported in BRE as well. However, we need to escape them to bring them special meaning. For example, the BRE “a\|b” matches a or b.
ERE has extended BRE. With ERE, we don’t need to escape |, ?, +, ( ), and { } to give them special meaning. For example, “a|b” matches a or b, and “[0-9]{2,4}” matches two, three, or four digits.
However, if we want to match those characters literally, we need to escape them. For instance, “a\|b” matches the literal string “a|b”.
In the beginning, PCRE was a library to implement the Perl Regex engine. Later, since Perl popularized Regex, it became a popular Regex flavor. Many other utilities and programming languages have Regex engines compatible with PCRE — for instance, Java, Python, and PHP.
PCRE’s syntax is much more powerful and flexible than BRE and ERE. Let’s have a look at a few features only available in PCRE:
Now, we know that we’re using PCRE when we use ‘\d‘. Only PCRE-compatible Regex engines can interpret PCRE correctly.
Next, let’s take a look at the Linux commands and which Regex flavors they support.
In this section, we’ll take the widely used GNU grep, GNU sed, and GNU awk as examples.
grep is by default in GNU BRE matching mode. That is to say, if we don’t set an option, it only supports BRE syntax. For example, we can match a line containing either “awesome” or “powerful“:
$ grep 'awesome\|powerful' input.txt
Linux is awesome!
It has many powerful commands
As we’ve seen in the command above, we’ve escaped the ‘|’ character to give it special meaning.
grep allows us to use the -E option to interpret patterns as ERE. Let’s do the same test with the -E option:
$ grep -E 'awesome|powerful' input.txt
Linux is awesome!
It has many powerful commands.
Note that we shouldn’t escape the ‘|’ when we pass the -E option to grep. Otherwise, grep will search the literal ‘|’ character.
GNU grep supports the -P option to interpret PCRE patterns. Therefore, if we want the grep command to match PCRE, for instance, “\d“, we should use the -P option:
$ grep -P '\d' input.txt
This server is running the Linux kernel 5.16.5-arch1-1.
As we can see, grep supports “\d“, but we must use the right option.
As is the case with grep, sed uses BRE by default. Additionally, we can pass the -r option to tell sed to use GNU ERE for pattern matching:
$ sed -n '/awesome\|powerful/p' input.txt
Linux is awesome!
It has many powerful commands.
$ sed -nr '/awesome|powerful/p' input.txt
Linux is awesome!
It has many powerful commands
However, sed doesn’t support PCRE. Therefore, sed cannot interpret “\d”.
On the other hand, GNU awk supports GNU ERE. Similarly, awk doesn’t support PCRE, either.
Consequently, we cannot use PCRE-unique features with sed and awk.
In this article, first, through an example, we’ve introduced the question that confused us: Why isn’t Regex \d supported by Linux commands, such as grep and sed?
Then, on the journey of seeking the answer to the question, we’ve discussed the three Regex flavors: BRE, ERE, and PCRE.
Further, we’ve talked about Regex compatibilities of common Linux commands such as grep, sed, and awk. Also, we’ve found the answer to the question.