1. Overview

For a Linux administrator, manipulating and managing files is a common task. In this tutorial, we’ll discuss how to delete a certain column of a file. To demonstrate, we’ll make use of the awk and cut commands.

2. Understanding the File Structure

Before we begin deleting columns in a file, we first need to understand its structure. For instance, we need to know whether the file columns are separated by a space, comma, or tab delimiter. Knowing this helps to correctly remove a column.

To illustrate, we’ll use two files containing similar information but different delimiters. So, the first file is a space-delimited file named cars.txt. We’ll view it with the cat command:

$ cat cars.txt 
Make Model Year Color
Toyota Camry 2022 Blue
Honda Accord 2021 Silver
Ford Mustang 2023 Red

Meanwhile, the second file is a comma-delimited file named cars.csv:

$ cat cars.csv 
Make,Model,Year,Color
Toyota,Camry,2022,Blue
Honda,Accord,2021,Red
Ford,Mustung,2023,Silver

In the upcoming sections, we’ll cover how to delete a column from both files.

3. Using the awk Command

awk is a command line tool that we can use for manipulating and analyzing files. In particular, it processes input files based on the rules we provide. This makes it flexible and customizable:

$ ​​​​awk 'pattern { action }' input_file

This example represents its general syntax. awk allows parameters that help in customizing the output:

  • pattern – represents a condition that is checked against each line in the input file
  • {action} – represents a set of commands or operations to be performed when the pattern is true
  • input_file – represents the name of the file that awk will process

3.1. Deleting a Column From a Space-Delimited File

First, let’s delete the third column in the cars.txt file:

$ awk '{OFS=" "; $3=""; gsub(/[[:space:]]+/, " "); print $0}' cars.txt > updated_cars.txt

Let’s break down this command:

  • OFS=” “ – sets the Output Field Separator to a space
  • $3=”” – deletes the content of the third column by modifying it to an empty string
  • gsub(/[[:space:]]+/, ” “) – replaces one or more whitespace characters (spaces, tabs) with a single space, thus removing extra spaces caused by deleting a column
  • print $0 – prints the entire line after its modification
  • cars.txt – represents the input file
  • > updated_cars.txt – redirects the modified output of the awk command to a new file named updated_cars.txt

Above, we delete the third column in the cars.txt file. Then, we save the modified information in the updated_cars.txt file to prevent overwriting the original file.

Next, let’s check the updated_cars.txt file:

$ cat updated_cars.txt 
Make Model Color
Toyota Camry Blue
Honda Accord Silver
Ford Mustang Red

The output above shows that we’ve successfully deleted the third column.

3.2. Deleting a Column From a Comma-Delimited File

Here, let’s delete the first column in the cars.csv file:

$ awk -F',' '{OFS=","; $1=""; sub("^,",""); gsub(",,",","); sub(",$",""); print $0}' cars.csv > updated_cars.csv

Further, let’s break down the command that helps us achieve this:

  • -F’,’ – sets the field separator to a comma
  • OFS=”,” – defines the Output Field Separator as a comma
  • $1=”” – sets the value of the first column to an empty string
  • sub(“^,”,””) – ensures there is no comma at the beginning of a line
  • gsub(“,,”,”,”) – removes any occurrence of multiple commas and replaces them with a single comma
  • sub(“,$”,””) – ensures there is no comma at the end of a line
  • print $0 – prints the entire line after it has been modified
  • cars.csv – represents the input file
  • > updated_cars.csv – used to redirect the output of the awk command to a file named updated_cars.csv

In summary, awk reads the content of the cars.csv file, defines the Output Field Separator as a comma, deletes the first column, and handles any issue regarding commas at the start, middle, and end of each line.

4. Using the cut Command

The cut command is crucial for extracting specific columns or fields from each line of a file. For this reason, it’s useful when working with files organized into columns or fields separated by spaces, tabs, or commas:

$ cut OPTION... [FILE]...

Considering the general syntax above, OPTION determines the behavior of the cut command whereas [FILE] represents the file from which data is extracted.

4.1. Deleting a Column From a Space-Delimited File

To demonstrate, let’s delete the second column in the cars.txt file:

$ cut -d' ' --complement -f2 cars.txt > updated_cars.txt
  • -d’ ‘ – specifies the delimiter used in this file as space
  • –complement – selects all the fields in the file except the specified one
  • -f2 – specifies the column to remove, which in this case is the second one
  • cars.txt – represents the input file
  • > updated_cars.txt – used to redirect the output of the above command to a new file named updated_cars.txt

Afterward, we can check whether the updated_cars.txt file contains the desired content.

4.2. Deleting a Column From a Comma-Delimited File

Equally important, we can use the cut command to delete a column from our file:

$ cut -d',' --complement -f3 cars.csv > updated_cars.csv

Now, let’s explain the above syntax:

  • -d’,’ – specifies the delimiter used, which in this case is a comma
  • –complement – this option tells cut to select all the columns except the specified ones
  • -f3 – specifies the column to be excluded from the output, which in this case is the third column
  • cars.csv – represents the input file
  • > updated_cars.csv – used to redirect the output of the cut command to a new file named updated_cars.csv

At this point, the third column is absent in the new updated_cars.csv file.

5. Conclusion

In this article, we discussed how to delete a certain column from a file using the Linux command line. To achieve this, we utilized both the awk and cut commands.

Comments are closed on this article!