Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: September 25, 2024
awk is a handy tool for manipulating and analyzing structured text data efficiently. One of awk‘s features is its ability to work with columns and perform conditional processing.
In this tutorial, we’ll explore how to use awk with column value conditions to extract, transform, and filter data.
To show the functionality of each snippet, we use the same file.
So, let’s look at the content of data.txt:
$ cat data.txt
Chinedu,35,Lagos
Amaka,28,Abuja
Olumide,42,Ibadan
Chinedu,51,Kano
Amaka,63,Lagos
Olumide,19,Port Harcourt
Amaka,47,Benin City
Chinedu,72,Enugu
Now, we can continue with specific examples.
Before we can work with columns in awk, we may need to define the way to split up the data into columns to begin with. For that, awk uses a field separator to know where each column starts and ends.
By default, awk assumes columns are separated by whitespaces. This means it splits each line of input into columns whenever it encounters whitespace. However, if the data uses a different delimiter such as a comma, we specify this using the -F option.
For example, we can tell awk that columns are separated by commas instead of spaces:
$ awk -F ',' '{ ... }' input.csv
In this command, -F ‘,’ sets the field separator to a comma, ‘{ … }’ represents the action we want awk to perform on each line, and input.csv is the input file containing comma-separated values.
Once awk knows how the columns are separated, we can work with each column via its position. For that, we use the $ sign followed by the column number.
For instance, $1 means the first column, $2 means the second column, and so on.
As an example, let’s print the second column from the example file:
$ awk -F ',' '{ print $2 }' data.txt
35
28
42
51
63
19
47
72
As a result, awk prints the second column value from each line of the data.txt file.
We can also get values from multiple columns at once by listing the column numbers we want.
For example, we can print the first and third columns from the same file:
$ awk -F ',' '{ print $1, $3 }' data.txt
Chinedu Lagos
Amaka Abuja
Olumide Ibadan
Chinedu Kano
Amaka Lagos
Olumide Port Harcourt
Amaka Benin City
Chinedu Enugu
Thus, awk prints the first and third column values from each line separated by whitespace.
When manipulating data with awk, conditional processing makes it possible to perform actions based on specific criteria within column values.
We can use comparison operators to check if a column value matches a certain condition. As expected, we have the most commonly used operators:
For example, we can display only the lines where the value in the third column exceeds 40:
$ awk -F ',' '$2 > 40' data.txt
Olumide,42,Ibadan
Chinedu,51,Kano
Amaka,63,Lagos
Amaka,47,Benin City
Chinedu,72,Enugu
In this command, awk examines each line in data.txt, checks if the third field is greater than 40, and prints the line if the condition is true.
To refine the data selection further, we can combine multiple conditions using logical operators:
Suppose we want to print lines where the first column equals Amaka and the second column is greater than 40:
$ awk -F ',' '$1 == "Amaka" && $2 > 40' data.txt
Amaka,63,Lagos
Amaka,47,Benin City
Here, awk processes each line and checks if both conditions are met:
It then prints only lines satisfying both criteria.
Alternatively, if we’re interested in lines where the first column is Amaka or the second column exceeds 40, we can use the logical OR operator:
$ awk -F ',' '$1 == "Amaka" || $2 > 40' data.txt
Amaka,28,Abuja
Olumide,42,Ibadan
Chinedu,51,Kano
Amaka,63,Lagos
Amaka,47,Benin City
Chinedu,72,Enugu
Thus, this command prints lines that meet at least one of the specified conditions.
awk can also use regular expressions to match patterns in column values. To leverage this, we employ the ~ operator:
$ awk '$3 ~ /ERROR/' log.txt
2023-04-01 10:32:15 ERROR Unable to connect to database
2023-04-01 10:50:10 ERROR File not found
This syntax tells awk to check whether the second column value for each line contains the word ERROR, and if so, print that line. Notably, we needn’t set the field separator since the file we query is using the default separator.
Now that we know how to work with columns and conditions in awk, we can combine them to perform even more useful and specific manipulations.
We can use conditions to pick out certain rows based on their column values.
For example, we can print the first and third columns but only for rows where the second column is Amaka:
$ awk -F ',' '$1 == "Amaka" { print $2, $3 }' data.txt
28 Abuja
63 Lagos
47 Benin City
This command checks the first column value for each row, and if it’s Amaka, it prints the second and third column values for that row.
awk also enables the changing of column values based on conditions.
For example, we can make the third column uppercase if the first column is Chinedu:
$ awk -F ',' '$1 == "Chinedu" { $3 = toupper($3) }; { print }' data.txt
Chinedu 35 LAGOS
Amaka,28,Abuja
Olumide,42,Ibadan
Chinedu 51 KANO
Amaka,63,Lagos
Olumide,19,Port Harcourt
Amaka,47,Benin City
Chinedu 72 ENUGU
As a result, this basic script checks if the first column is Chinedu for each line. If so, it converts the third column to uppercase using the toupper() function. Finally, the { print } part at the end tells awk to print every line, including the changed ones.
We can use awk to calculate totals or other aggregate values based on conditions, too.
For instance, we can add up the values in the second column, but only for lines where the first column is Chinedu:
$ awk -F ',' '$1 == "Chinedu" { sum += $2 } END { print sum }' data.txt
158
Thus, we tell awk to keep a running total in the sum variable, adding the second column value each time the first column is Chinedu. The END part runs after all the lines are processed and prints the final total.
We can use variables and arrays in awk to store and work with column values.
To illustrate, let’s keep track of all the different values in the second column:
$ awk -F ',' '{ unique[$2] = 1 } END { for (val in unique) print val }' data.txt
42
72
63
35
28
51
47
19
This code uses an array called unique to remember each value it sees in the second column. The END part prints out all the unique values at the end.
In this article, we explored how awk can be utilized to process text files based on column value conditions.
In conclusion, awk provides a flexible way to access and modify columns in structured data.