1. Overview

Awk is a quintessential utility in Linux for processing text and generating reports for analytics. Sometimes, data is available in tabular format, and we need to perform arithmetic operations across columns to get the desired result.

In this tutorial, we’ll learn how to multiply columns using awk in different scenarios.

2. Two Columns From the Same File

Let’s say that we’ve got the length and breadth of rectangles in the rectangular_data.txt file:

$ cat rectangular_data.txt
 l      b
100    150
200    250
300    350
400    450
500    500

Further, our goal is to report the area of each rectangle as a third column. We can multiply each row’s first ($1) and second ($2) column values to compute the rectangular area. So, let’s go ahead and see this in action:

$ awk \
'BEGIN {printf "%6s%6s%6s\n", "l", "b", "a"}
NR>1{printf "%6d %6d %6d\n",$1,$2,$1*$2}' \
rectangular_data.txt
      l     b     a
   100    150  15000
   200    250  50000
   300    350 105000
   400    450 180000
   500    500 250000

We must note that we’ve used the printf function to print the output in a fixed-width format using placeholder strings such as %6s and %6d. Furthermore, we used the BEGIN block to print the header and the main block to print each rectangle’s length (l), breadth, (b), and area (a) values.

3. Two Columns From Different Files

Let’s imagine a scenario where we want to report rectangular areas, but the lengths and breadths of rectangles are available in different files.

3.1. Understanding the Scenario

Let’s inspect the lengths.txt and breadths.txt files that contain lengths and breadths of rectangles, respectively:

$ head lengths.txt breadths.txt
==> lengths.txt <==
 l
100
200
300
400
500

==> breadths.txt <==
 b
150
250
350
450
500

3.2. Using paste and awk

Although awk supports reading multiple files, it requires us to read the data sequentially. As a result, the reading and multiplication logic would become more complex than earlier.

To simplify the problem, we can use the paste command to merge the content from the lengths.txt and breadths.txt files and reuse the awk command as if both values are coming from a single file. So, let’s proceed to use the paste command and pass its output to awk:

$ paste lengths.txt breadths.txt | awk \
'BEGIN {printf "%6s%6s%6s\n", "l", "b", "a"}
NR>1{printf "%6d %6d %6d\n",$1,$2,$1*$2}'
     l     b     a
   100    150  15000
   200    250  50000
   300    350 105000
   400    450 180000
   500    500 250000

Perfect! The result is as we expected it.

3.3. Using NR and FNR With awk

If we want to solve the use case with awk without relying on the paste command, we need to read both files sequentially and write appropriate logic to produce multiplication results.

Let’s start by understanding the concept of NR and FNR to build the logic to multiply columns from two files. For this purpose, let’s read the two files and print the values of NR, FNR, and FILENAME variables for each record:

$ awk \
'BEGIN{print "NR FNR FILENAME"} 
{printf "%s %s %s\n", NR, FNR, FILENAME}' \
lengths.txt breadths.txt
NR FNR FILENAME
1 1 lengths.txt
2 2 lengths.txt
3 3 lengths.txt
4 4 lengths.txt
5 5 lengths.txt
6 6 lengths.txt
7 1 breadths.txt
8 2 breadths.txt
9 3 breadths.txt
10 4 breadths.txt
11 5 breadths.txt
12 6 breadths.txt

We must note that while NR represents the absolute value of the record number irrespective of the files read so far, FNR is the record number relative to the current file. Additionally, we can interpret that NR and FNR are the same for the first file.

Now, let’s apply our understanding of NR and FNR variables to multiply the respective columns from lengths.txt and breadths.txt, respectively:

$ awk \
'BEGIN {printf "%6s%6s%6s\n", "l", "b", "a"} 
{
    if (NR>1 && NR==FNR) {
        mem[NR]=$1; 
        next
    }
    if (FNR>1 && NR>FNR) {
        printf "%6d %6d %6d\n", mem[FNR], $1, mem[FNR] * $1;
    }
}' \
lengths.txt breadths.txt
# same output as earlier

Let’s break this down to understand the underlying logic. First, we used the BEGIN block to print the headers in a fixed-width layout. Then, we used the condition NR==FNR to read the content only from the first file into the mem array while skipping the second file using the next function. Lastly, we used the condition NR>FNR to multiply the first column ($1) from the second file with the corresponding record from the mem array.

4. N Columns From the Same File

Let’s take another scenario where we’ve got N columns in the same file and want to multiply all the columns for each record individually.

For simplicity, let’s take a look at n_dimensional_data.txt, which contains eight columns:

$ cat n_dimensional_data.txt
d1 d2 d3 d4 d5 d6 d7 d8
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 10

We must note that although the file contains only eight columns, we’ll write a generic logic to handle N columns.

Now, let’s go ahead and see how to use the for-loop construct in awk to multiply all the columns for each record:

$ awk \
'BEGIN {print "product"}
NR>1 {
    product = 1; 
    for (i=1; i<=NF; i++) {
        product *= $i;
    } 
    print product;
}' \
n_dimensional_data.txt
product
40320
362880
1814400

The multiplication logic is self-explanatory, wherein we set the variable product to 1 before each iteration. Further, we access the last column number using the in-built variable, NF, and ith field values as $i.

That’s it! We’ve successfully solved our use case to multiply columns for one more scenario.

5. N Columns From Different Files

In this section, we’ll explore another scenario where we need to multiply columns, one from each file.

5.1. Scenario

Let’s check out the input data from three files, namely lengths.txt, breadths.txt, and heights.txt:

$ head lengths.txt breadths.txt heights.txt
==> lengths.txt <==
 l
100
200
300
400
500

==> breadths.txt <==
 b
150
250
350
450
500

==> heights.txt <==
 h
1
2
3
4
5

We must note that although we’re considering only three files, we’ll write a generic solution for N files.

5.2. ARGC and ARGV

We’d need to use ARGC and ARGV variables to solve our use case. So, let’s understand the concepts around ARGC and ARGV while passing multiple files to awk:

$ awk \
'BEGIN {
    print ARGC; 
    for (i=0;i<ARGC;i++) 
        print ARGV[i];}
' lengths.txt breadths.txt heights.txt
4
awk
lengths.txt
breadths.txt
heights.txt

We can notice that while ARGV[0] contains the awk string, filenames go into ARGV[1], ARGV[2], and ARGV[3].

5.3. awk Script

Let’s use the concepts around ARGC and ARGV variables to write the multiply_n_files.awk script in its entirety:

$ cat multiply_n_files.awk
NR==FNR {
    if(NR==1) {
        print "product"
        for(i = 2; i < ARGC; i++) {
            getline col < ARGV[i]
        }
    } else {
        product = $1;
        for (i = 2; i < ARGC; i++) {
            getline col < ARGV[i];
            product *= col;
        }
        print product
    }
}

Let’s break down the logic to understand it clearly. First, we’re executing the entire block only for the first file by checking if NR==FNR. Next, we’re reading one column each from the remaining files in the variable, col, using the getline function while skipping the headers. Lastly, we print the multiplication result in the product variable.

Lastly, let’s execute the multiply_n_files.awk script to solve our use case:

$ awk -f multiply_n_files.awk lengths.txt breadths.txt heights.txt
product
15000
100000
315000
720000
1250000

6. Conclusion

In this article, we learned how to multiply columns using awk. Furthermore, we solved the use case for multiple scenarios where columns are from the same file or different files.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.