1. Overview

Awk is a powerful and robust text-processing tool in itself. However, we can enhance its scripting capabilities by using shell functions inside an Awk script.

In this tutorial, we’ll explore multiple ways to use shell functions within Awk scripts.

2. Understanding the Scenario

Let’s imagine that we’ve got a sample log.txt file containing epoch timestamps in the first column:

$ cat log.txt
1701460800 event-1
1701547200 event-2
1701633600 event-3
1701720000 event-4
1701806400 event-5
1701892800 event-6

Additionally, we’ve defined a shell function, epoch_to_date(), for converting epoch timestamps to human-readable timestamps:

$ cat functions.sh
epoch_to_date() {
    date -d "@$1" "+%Y-%m-%d %H:%M:%S"
}

Our goal is to use the epoch_to_date() function within an Awk script to show human-readable timestamps from the log.txt file.

3. Sourcing Script

In this section, we’ll learn how we can source the functions.sh script and use the epoch_to_date() function to solve our use case.

3.1. Sourcing

Let’s write the caller.sh script to source the functions.sh script and call the epoch_to_date() function:

$ cat caller.sh
#!/bin/bash
source ./functions.sh
epoch_to_date "$1"

We must note that the first positional command-line argument ($1) is passed to the epoch_to_date() function.

3.2. With the system Function

Now, we can write a short one-liner Awk script that uses the system() function to execute the caller.sh script:

$ awk '{system("./caller.sh " $1);}' log.txt
2023-12-01 20:00:00
2023-12-02 20:00:00
2023-12-03 20:00:00
2023-12-04 20:00:00
2023-12-05 20:00:00
2023-12-06 20:00:00

It looks like we’ve got the desired result. Further, we must note that we passed the first field ($1) as an argument to the caller.sh script:

3.3. With the getline Function

Alternatively, we can use the getline function to call the caller.sh script:

$ awk '{
    cmd = "./caller.sh \"" $1 "\""; 
    cmd | getline result; 
    close(cmd); 
    print result;
}' log.txt
2023-12-01 20:00:00
2023-12-02 20:00:00
2023-12-03 20:00:00
2023-12-04 20:00:00
2023-12-05 20:00:00
2023-12-06 20:00:00

Like earlier, invocation to the caller.sh calls the epoch_to_date() function.

Additionally, we get the output into the result variable using the getline function. So, if our use case requires it, we can also concatenate it with the second field ($2):

$ awk '{
    cmd = "./caller.sh \"" $1 "\""; 
    cmd | getline result; close(cmd); 
    print result, $2;
}' log.txt
2023-12-01 20:00:00 event-1
2023-12-02 20:00:00 event-2
2023-12-03 20:00:00 event-3
2023-12-04 20:00:00 event-4
2023-12-05 20:00:00 event-5
2023-12-06 20:00:00 event-6

Great! We’ve solved an extended use case with this approach.

4. Inline Functions

Another way to use shell functions inside an Awk script is to write them as an inline command string. Then, we can execute them using the system or getline functions.

4.1. Inline Function as Command String

Let’s see how we can initialize the cmd variable with the epoch_to_date() function as a command string:

'{cmd="bash -c '\''epoch_to_date() { date -d \"@\"$1 \"+%Y-%m-%d %H:%M:%S\"; }; epoch_to_date \"" $1 "\"'\''";}'

Although the quoting in this string may look daunting, it contains the familiar epoch_to_date() function from the functions.sh script. So, let’s break down the quoting and overall command.

First, we’ve got the outermost single quotes for the overall Awk program within which we’re defining the cmd variable:

'{cmd=...}'

Then, we have the inner double quote string to define the cmd string variable in Awk:

'{cmd="bash -c ...";}'

We must note that we don’t require quoting for double quote strings within the single quote string. So, it looks pretty intuitive.

Next, we must define the Bash functions and commands within single quotes. So,  we use the escaped string, ‘\”, to define a single quote within a single quote:

'{cmd="bash -c '\''...'\''";}'

Lastly, we add the epoch_to_date() function, keeping in mind that we need to escape any single or double quotes accordingly:

'{cmd="bash -c '\''epoch_to_date() { date -d \"@\"$1 \"+%Y-%m-%d %H:%M:%S\"; }; epoch_to_date \"" $1 "\"'\''";}'

We can notice that we use \” escaped string for adding any double quotes within the outer double quotes.

4.2. With the system Function

Now that we’ve got the epoch_to_date() as an inline function, let’s use the system() function to execute the command string:

$ awk '{
    cmd = "bash -c '\''epoch_to_date() { date -d \"@\"$1 \"+%Y-%m-%d %H:%M:%S\"; }; epoch_to_date \"" $1 "\"'\''";
    system(cmd);
}' log.txt
2023-12-01 20:00:00
2023-12-02 20:00:00
2023-12-03 20:00:00
2023-12-04 20:00:00
2023-12-05 20:00:00
2023-12-06 20:00:00

Perfect! It looks like we nailed this one.

4.3. With the getline Function

Similarly, we can use the getline() function to execute the epoch_to_date() function from the cmd string:

$ awk '{
    cmd = "bash -c '\''epoch_to_date() { date -d \"@\"$1 \"+%Y-%m-%d %H:%M:%S\"; }; epoch_to_date \"" $1 "\"'\''";
    cmd | getline result;
    close(cmd);
    print result
}' log.txt
2023-12-01 20:00:00
2023-12-02 20:00:00
2023-12-03 20:00:00
2023-12-04 20:00:00
2023-12-05 20:00:00
2023-12-06 20:00:00

We get the correct results.

Additionally, we benefit from having the output into the result variable. So, we can concatenate the result variable with the second field ($2) for an extended use case:

$ awk '{
    cmd = "bash -c '\''epoch_to_date() { date -d \"@\"$1 \"+%Y-%m-%d %H:%M:%S\"; }; epoch_to_date \"" $1 "\"'\''"; 
    cmd | getline result; 
    close(cmd); 
    print result, $2 
}' log.txt
2023-12-01 20:00:00 event-1
2023-12-02 20:00:00 event-2
2023-12-03 20:00:00 event-3
2023-12-04 20:00:00 event-4
2023-12-05 20:00:00 event-5
2023-12-06 20:00:00 event-6

Fantastic! It works as expected.

5. Using a User-Defined Awk Function

We can also read the shell function into a shell variable and pass it to the Awk script as a parameter. Furthermore, we can write a user-defined function in our Awk script that calls the shell function.

So, let’s write the Awk script with epoch_to_date_shell as the parameter and epoch_to_date_awk() as the user-defined function, respectively:

$ awk -v epoch_to_date_shell="$(<functions.sh)" '
function epoch_to_date_awk(epoch) {
    cmd = "bash -c \047" epoch_to_date_shell"; epoch_to_date " epoch "\047";
    cmd | getline result;
    close(cmd);
    return result;
}
{
    print epoch_to_date_awk($1), $2;
}' log.txt
2023-12-01 20:00:00 event-1
2023-12-02 20:00:00 event-2
2023-12-03 20:00:00 event-3
2023-12-04 20:00:00 event-4
2023-12-05 20:00:00 event-5
2023-12-06 20:00:00 event-6

We must note that we used the -v flag to pass the shell function as a parameter. Further, we used the octal sequence, \047, to represent single quotes, which makes the command string much more readable.

6. Conclusion

In this article, we learned how to use shell functions inside Awk scripts. Furthermore, we explored multiple ways, like sourcing shell scripts, writing inline shell functions, and user-defined Awk functions to solve our use case.

Lastly, we used the system() and getline() functions as two core functions for calling a shell function.

Comments are closed on this article!