1. Overview

AWK is universally used as a data extraction and reporting utility in Linux systems. In this tutorial, we’ll learn how to write dynamic awk scripts by passing parameters.

2. Scenario Setup

Before advancing to learn how to pass parameters to an awk script, we need to create some simulation data. For this, let’s create the employees.db file that contains comma-separated values:

$ cat employees.db
Name,Salary
Alice,25000
Alex,35000
Raymond,15000
Leo,7900

We can notice that the first record contains the field names — namely, Name and Salary.

Next, we must understand that the awk scripts differ from Bash scripts regarding passing parameters. Unlike Bash scripts that use $1, $2, and so on as positional arguments, awk scripts will interpret these as built-in field variables.

Further, let’s see this in action with the help of a one-line awk command based on the pattern-action paradigm:

$ awk -F',' 'NR>1 {print $1,$2}' employees.db
Alice 25000
Alex 35000
Raymond 15000
Leo 7900

We can see that using $1 and $2 variables helped us display the employee’s name and salary fields in a space-separated format. Moreover, we could show records from the second line onward using the built-in variable NR, which denotes the current record number.

In the following sections, we’ll use the employees.db file as input data to learn how to pass parameters to an awk script.

3. Using Command-Line Named Arguments

Let’s say we want to write an awk script to show the names and salaries of employees in a space-separated format. Additionally, we want our script to accept a parameter to exclude an employee by name from the original list.

First, let’s look at the awk usage with the -v option to pass parameters as variables:

$ awk -v <variable_name>=<value> -f <awk_script> <file>

Next, let’s assume we’ll pass a variable named exclude specifying the employee name we need to exclude from the report. Based on this assumption, let’s write the filter_employees.awk script:

$ cat filter_employees.awk
BEGIN {
    FS=","
}

NR>1 {
    if ($1 != exclude) {
        print $1,$2
    }
}

Notice that we’ve added a condition that matches the first field ($1) against the value in the exclude variable.

Finally, let’s execute the script and see it in action:

$ awk -v exclude="Leo" -f filter_employees.awk employees.db
Alice 25000
Alex 35000
Raymond 15000
$ awk -v exclude="Alice" -f filter_employees.awk employees.db
Alex 35000
Raymond 15000
Leo 7900

It looks like we’ve got this right, as the output doesn’t contain the employee whose name is specified with the exclude variable while executing the script.

4. Using Command-Line Positional Arguments

Let’s say we get a new requirement to extend our awk script to exclude multiple employees. Earlier, we excluded a single employee using a named command-line argument, but that approach won’t work here. So, in this section, we’ll solve this use case using the built-in variable ARGC and positional argument array ARGV.

Firstly, let’s write a one-line awk command to understand the meaning of ARGC and ARGV variables:

$ awk -e 'BEGIN{ for(i=0;i<ARGC;i++) print "ARGV["i"]="ARGV[i]}' \
exclude_1=Leo exclude_2=Alice exclude_3=Raymond \
employees.db
ARGV[0]=awk
ARGV[1]=exclude_1=Leo
ARGV[2]=exclude_2=Alice
ARGV[3]=exclude_3=Raymond
ARGV[4]=employees.db

We must notice that there is only a BEGIN block in our awk program that iterates over the ARGV array within the bounds defined by the ARGC variable. While ARGC stores the total count of arguments, ARGV stores the actual values.

Furthermore, we can see that the awk program and the options such as -e are excluded from ARGV. As a result, “awk” goes in ARGV[0], the exclude-specific parameters go in ARGV[1], ARGV[2], and ARGV[3], while the filename goes in ARGV[4].

Next, let’s write a new script called filter_employees_v2.awk to exclude multiple employees by extending the filter_employees.awk script:

$ cat filter_employees_v2.awk
BEGIN {
    FS=","
    exclude_index=0
    for (i=1; i < ARGC; i++) {
        if(ARGV[i] ~ /exclude_[0-9]*=.*/) {
            split(ARGV[i], excludeArr, "=")
            EXCLUDE[++exclude_index]=excludeArr[2]
        }
    }
}

NR>1 {
    show="true"
    for (i in EXCLUDE) {
        exclude=EXCLUDE[i]
        if($1 ==  exclude) {
            show="false"
        }
    }
    if(show == "true") {
        print $1,$2
    }
}

We must notice that we used the BEGIN block to aggregate all the exclude-specific parameters into the EXCLUDE array. Additionally, we modified the main block to check if the employee name referred by $1 matches against any of the values in EXCLUDE.

Finally, let’s run the filter_employees_v2.awk script by passing multiple employee names for exclusion:

$ awk -f filter_employees_v2.awk exclude_1=Leo exclude_2=Alice exclude_3=Raymond employees.db
Alex 35000

Perfect! The result meets our expectations.

5. Using Environment Variables

Yet another way to pass parameters to an awk script is through environment variables.

First, let’s write a one-line awk command to understand how to pass and access the environment variables:

$ export EXCLUDE_EMPLOYEES="Leo,Alice" && awk -e 'BEGIN {print ENVIRON["EXCLUDE_EMPLOYEES"]}' employees.db
Leo,Alice

We must note that we need to export the EXCLUDE_EMPLOYEES variable so that awk can access it. Additionally, we access it through the associative array called ENVIRON.

Next, let’s write a new awk script named filter_employees_v3.awk to accept an environment variable called EXCLUDE_EMPLOYEES containing comma-separated names:

$ cat filter_employees_v3.awk
BEGIN {
    FS=","
    split(ENVIRON["EXCLUDE_EMPLOYEES"], EXCLUDE, ",")
}

NR>1 {
    show="true"
    for (i in EXCLUDE) {
        exclude=EXCLUDE[i]
        if($1 ==  exclude) {
            show="false"
        }
    }
    if(show == "true") {
        print $1,$2
    }
}

Notice that we used the BEGIN block to populate the EXCLUDE array by splitting the comma-separated values available through ENVIRON[“EXCLUDE_EMPLOYEES”]. Moreover, the main block remains unchanged.

Finally, let’s execute the filter_employees_v3.awk script by passing multiple employee names for exclusion:

$ export EXCLUDE_EMPLOYEES="Leo,Alice" && awk -f filter_employees_v3.awk employees.db
Alex 35000
Raymond 15000

Great! It looks correct.

6. Conclusion

In this tutorial, we learned how to make dynamic awk scripts by passing parameters to them. We also learned about a few options, such as -v, and some built-in variables, such as ARGC, ARGV, and ENVIRON, that are available in AWK.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.