1. Introduction

The AWK programming language, implemented by the awk interpreter, is a versatile way to parse and manipulate data. In particular, we can integrate awk commands within shell scripts, but we can also use shell features like variables within AWK scripts.

In this tutorial, we’ll talk about ways to employ shell variables as one of the main parts of an AWK script or one-liner. First, we explore the general mechanism that awk follows when processing data. After that, we delve into the main subject of employing shell variables as patterns. Finally, we discuss potential pitfalls.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.

2. How AWK Processes Data

In short, awk statements comprise two parts:

By default, records are lines, so pattern works per line. In fact, pattern can also be a kind of condition or even a general expression.

Let’s see a basic example of a pattern:

$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/[23]/'
Line 2.
Line 3.

Here, we pipe the three-line output from printf to awk. Within AWK, we use a basic regular expression (regex) character group as the pattern between // slashes. Since we only match lines that contain either 2 or 3, the final output excludes Line 1.

In fact, the reason that we see any output at all is the fact that the AWK print statement is the implicit action for all patterns that return true but do not include explicit actions. Because of this, we can rewrite the above example:

$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/[23]/ { print; }'
Line 2.
Line 3.

In this case, the action is the print within braces. For our purposes, we mostly skip the explicit statement and stick with the first example above.

Now, let’s see how we can construct a pattern that includes shell variables.

3. Using Shell Variable as Pattern

While we can pass parameters to awk, only some ways to use shell variables work with patterns.

3.1. Embedding

As usual, we can employ quotes in a specific manner to ensure a given shell variable is interpreted directly within the text of an AWK script:

$ CHARS=23
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/['"$CHARS"']/'
Line 2.
Line 3.

In this case, to achieve the results from earlier, we first define the $CHARS shell variable with the value 23. After that, we terminate the single quotes that surround the AWK expression to insert an interpolation of the external $CHARS variable at the given location.

Moreover, we can replace the whole pattern this way:

$ PATTERN=[23]
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/ { print; }'
Line 2.
Line 3.

Since embedding the variable is risky, let’s explore other ways as well.

3.2. Internal Variable

There are several methods to assign an external variable value to an internal variable:

  • embedding: break the quote around the script, insert an interpolated shell variable, and resume the quote
  • direct input: redirect via here-string or similar to pass shell variable values directly with the regular data
  • ARGV: provide shell variable values as arguments to the AWK script
  • ENVIRON: directly access exported shell variables within AWK
  • predefine: leverage switches and options of awk to make shell variable values available

Whichever method we choose, after performing the assignment, we can employ any internal variable as a pattern.

Let’s use predefined variables:

$ PATTERN=[23]
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk -v pat=$PATTERN '$0 ~ pat'
Line 2.
Line 3.

Here, we add three concepts to our earlier examples:

In other words, we perform the same matching but with ~ tilde against $0 instead of just // forward slashes and with the regular expression stored in the internal pat variable, initialized via the external $PATTERN shell variable.

Importantly, there are methods to pass external shell variable values that have limitations like lack of access within the BEGIN block and similar.

4. Pitfalls

Naturally, since patterns are often regular expressions, any value we use as a pattern should have the proper syntax and escaping.

This is especially important when it comes to embedded values:

$ PATTERN='2\'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
awk: line 1: regular expression compile failed (missing operand)
2|

Here, the | pipe symbol trips up the interpreter. To avoid such situations, we might need to sanitize user input or generated variables to comply with the regex syntax rules.

In particular, we can either complete the regular expression or escape specific characters and sequences with a backslash:

$ PATTERN='2|3'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
Line 2.
Line 3.
$ PATTERN='2\|'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
$

Of course, the second regex doesn’t return a value since 2| aren’t together anywhere in the data.

Generally, accounting for variable interpolation and potential conflicts with special regex patterns is often vital.

5. Conclusion

In this article, we talked about using shell variables within and as AWK patterns.

In conclusion, due to the flexibility of most shells and the syntax of awk, we can choose one of several ways to place a shell variable value directly within or as a pattern.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.