How to Determine Column Numbers and Names in a File

1. Overview

The content of a file may sometimes look like a spreadsheet consisting of columns and rows. Additionally, the first row might consist of column names.

In this tutorial, we’ll discuss how to list the column names in a file together with the column numbers.

2. Sample Task

We’ll use the following file, countries, in our examples:

$ cat countries
Number|Country|Continent|Capital City
1|Portugal|Europe|Lisbon
2|India|Asia|New Delhi
3|Egypt|Africa|Cairo
4|Canada|America|Ottawa
5|Turkey|Asia&Europe|Ankara

The file has four columns, namely Number, Country, Continent, and Capital City. Each column is separated from the other columns with a vertical bar (|).

Our objective is to list each column name preceded by its column number.

3. Using awk

We can use GNU awk to accomplish the task. In particular, we’ll discuss two methods under this approach.

3.1. First Method Using awk

The first method we’ll discuss uses the awk command to display the column names and numbers in a file:

$ awk -F'|' 'NR==1 {for (i=1; i <= NF; i++) print i, $i}' countries
1 Number
2 Country
3 Continent
4 Capital City

The output contains the column numbers and names, as expected. We specify that the columns are separated by vertical bars using the -F’|’ part of the command. The -F option of awk is for specifying the field separator.

The NR built-in variable of awk holds the number of records, i.e., the number of lines. The NR==1 statement specifies the range of records we’re interested in. In our case, we only display the first line in countries, which corresponds to the column names.

Then, we display the column numbers and names using the for loop, for (i=1; i <= NF; i++) print i, $i. The NF built-in variable of awk holds the number of fields in the current record. Therefore, it’s the number of columns in our case.

The for loop iterates over all the columns and displays the column number and name using the print i, $i statement.

This method is robust as the addition of new columns, deletion of columns, or renaming of existing columns don’t affect the method.

3.2. Second Method Using awk

Now, let’s see another method that uses awk:

$ awk 'BEGIN {RS="|"} /^Number|^Country|^Continent|^Capital City/ {column=gensub(/([^\n]+).*/,"\\1",1); print NR, column}' countries
1 Number
2 Country
3 Continent
4 Capital City

First, we set the built-in RS variable of awk to the field separator, |, in the BEGIN rule of awk. The RS built-in variable defines a record. We tell awk to interpret each part ending with | as a different record. The default value of RS is the newline character. The BEGIN rule is executed only once before awk processes input records.

Then, we check if the content of an input record matches the pattern /^Number|^Country|^Continent|^Capital City/. An input record matches this pattern if its content starts with Number, Country, Continent, or Capital City. Obviously, these are the columns in the first line of the countries file.

If the input record matching the pattern contains a newline, we extract the part of the record preceding the newline. We perform this substitution due to the last record. It includes not only Capital City but also a newline character and the number 1 in the beginning of the second line.

We implement the substitution with the gensub function of awk using column=gensub(/([^\n]+).*/,”\\1″,1).

The first argument of gensub, /([^\n]+).*/, specifies a pattern consisting of one or more non-newline characters captured in a group indicated within parentheses, followed by zero or more characters. If this pattern exists, we carry out the substitution using the second argument “\\1”. The first captured group, signified by \1, refers to the part before any newline. Moreover, the substitution isn’t global and occurs only for the first match because of the third parameter, 1.

This substitution doesn’t affect the other columns before the last column as they don’t match the /([^\n]+).*/ pattern.

We assign the substitution’s result to column. Finally, we print the column number and name using print NR, column.

Notably, the output is the same as before. However, since this method depends on column names explicitly, we need to update it if we rename columns or add new columns.

4. Using sed and nl

Another option for listing the column names and numbers is using the sed and nl commands.

First, let’s list only the column names using sed:

$ sed 's/|/\n/g;q' countries
Number
Country
Continent
Capital City

We replace all the vertical bars with the newline character, \n, in countries using the substitution command s/|/\n/g. However, we do this substitution only in the first line of countries because of the quit command q after s/|/\n/g, i.e., s/|/\n/g;q.

The output consists of the column names, as expected. Additionally, we can use the nl command to add the column numbers:

$ sed 's/|/\n/g;q' countries | nl
     1	Number
     2	Country
     3	Continent
     4	Capital City

The nl command is useful for numbering lines. The line numbers correspond to column numbers in our case. Therefore, we’re successful in displaying column numbers and names using the sed and nl commands.

5. **Using head, tr, and nl**

We can also use the tr command to display column names and numbers.

First, let’s list only the column names:

$ head -1 countries | tr '|' '\n'
Number
Country
Continent
Capital City

We use the head command to display the beginning of a file. Normally, it displays the first ten lines of a file. However, we print only the first line of countries using head -1 countries.

Then, we pass the output of head -1 countries to the tr command using a pipe. The tr command replaces the vertical bars with newline characters.

We can add the column numbers using the nl command as before:

$ head -1 countries | tr '|' '\n' | nl
     1	Number
     2	Country
     3	Continent
     4	Capital City

Therefore, we’re successful in displaying column numbers and names using the head, tr, and nl commands.

6. **Using head, perl, and nl**

The Perl programming language has powerful string-parsing capabilities. Like the usage in the previous section, we can use the command-line Perl interpreter, perl, instead of the tr command:

$ head -1 countries | perl -pe "s/\|/\n/g" | nl
     1	Number
     2	Country
     3	Continent
     4	Capital City

The -p option of perl places a printing loop around the Perl statement, s/\|/\n/g, which replaces column separators with newline characters. The -e option of perl, on the other hand, runs a single-line Perl statement from the command line.

The output contains the column names together with the column numbers, as expected.

7. Conclusion

In this article, we discussed how to list the column names in a file together with the column numbers.

First, we explored two methods using awk. We saw that the first method using awk is more general than the second one as it doesn’t depend on specifying column names.

Then, we learned how to list column names and numbers using primarily sed, tr, and perl. We used the head command to get the first line containing the column names when using tr and perl. We also saw that nl is useful for numbering the columns.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security