1. Overview

Transposing a file is a common operation in data processing and analysis, where the rows and columns of a table are swapped. This can be useful for reformatting data for a different tool or visualizing it differently.

In Linux, there are several ways to transpose a file from the command line. In this tutorial, we’ll explore some of these methods.

2. Transposing a File

One of the simplest ways to transpose a file in Linux is by using the native GNU tools in Bash. Another approach is to call Python’s pandas library. We’ll explore both approaches.

However, first, let’s inspect the input file that we’ll be transposing:

$ cat input
name age
Cal 21
Tom 30
Val 37
Lin 45

The file shows space-delimited tabular data consisting of two columns and five rows. Therefore, the transposed output should have five columns and two rows. We assume that each row in the file has the same number of columns so that the data is tabular.

2.1. Using GNU Utilities

The coreutils package in Bash includes a set of powerful text processing tools that can be used to manipulate data in various ways. In particular, we’ll need the cut command and echo or xargs.

Let’s write a short script called transpose.sh that will accept an input file and perform the transpose operation:

$ cat transpose.sh
#!/usr/bin/env bash
input_file="$1"
n_cols=$(head -1 "$input_file" | wc -w)
for i in $(seq 1 "$n_cols"); do
    echo $(cut -d ' ' -f "$i" "$input_file")
done

The first line of the script is a shebang directive indicating that the script should be executed as a Bash script. We save the first argument of transpose.sh in a variable called input_file. Then, we obtain the number of columns, n_cols, by extracting the first line of the input file via head -1 and counting the number of space-delimited words via wc -w.

Once we have the number of columns, we iterate over these in a for loop and cut out the columns with the cut command before echoing out the items as a row. The -d option of the cut command specifies the delimiter to use, which is a space in this case, while the -f option specifies the field or column to extract. The cut command is run in a subshell command, allowing the output to be echoed.

Let’s give the script execute permission and run it over our input file:

$ chmod u+x transpose.sh
$ ./transpose.sh input
name Cal Tom Val Lin
age 21 30 37 45

The script accepts a file as an argument and returns the transposed content.

It’s worth noting that instead of running the cut command in a subshell and echoing out its output, we may pipe the output of cut into xargs. The xargs command will, by default, echo the input it receives if no other command is specified after it.

Let’s modify our script to show these changes:

$ cat transpose.sh
#!/usr/bin/env bash
input_file="$1"
n_cols=$(head -1 "$input_file" | wc -w)
for i in $(seq 1 "$n_cols"); do
    cut -d ' ' -f "$i" "$input_file" | xargs
done

Now, let’s run the script over the input file:

$ chmod u+x transpose.sh
$ ./transpose.sh input
name Cal Tom Val Lin
age 21 30 37 45

We obtain the same result as before.

2.2. Using Python and Saving the Output

Another simple way to transpose data is by using Python’s transpose method within the pandas library. Python comes preinstalled on most Linux distributions. We can call a Python script directly or embed Python code within a Bash script.

Let’s first install pandas using the pip3 package manager for Python:

$ pip3 install pandas

Now, let’s write a Python script called transpose.py that will accept an input file and perform the transposition:

$ cat transpose.py
#!/usr/bin/env python3
import pandas as pd
import sys
input_file = sys.argv[1]
df = pd.read_csv(input_file, sep=' ')
df.T.to_csv('./output', sep=' ', header=False)

The first line of the script is a shebang header indicating that the script should be executed as a Python script. Then, we import the pandas library to later read the data into a data frame object before we can transpose it.

We also import the sys library to parse and identify the arguments passed to the script. In particular, sys.argv[1] is the first argument the script receives, which will be the input file.

The data is read into variable df, which is now a pandas data frame. Then, we call the transpose operator and save the result in a CSV file named output but with the separator set as a space instead of a default comma. We also set the header parameter to False to not save the header introduced by the pandas data frame transposition.

Let’s grant the script execute permission and run it:

$ chmod u+x transpose.py
$ ./transpose.py input
$ cat ./output
name Cal Tom Val Lin
age 21 30 37 45

Then, reading out the output file via cat, we see the result is the transpose of the input file.

Now, instead of writing a standalone script, we may embed Python code within Bash by using a here-document:

$ python3 << EOF
> import pandas as pd
> df = pd.read_csv('./input', sep=' ')
> df.T.to_csv('./output', sep=' ', header=False)
> EOF
$ cat ./output
name Cal Tom Val Lin
age 21 30 37 45

This calls Python to execute each of the lines specified between the EOF (End of File) markers.

We may also rewrite the script transpose.py compactly as a one-liner:

$ python3 -c 'import pandas as pd; pd.read_csv("./input", sep=" ").T.to_csv("./output", sep=" ", header=False)'
$ cat ./output
name Cal Tom Val Lin
age 21 30 37 45

Here, we’ve separated the Python commands with semicolons to allow writing them on a single line. We’ve also skipped storing the data frame in the intermediate variable, df.

2.3. Using Python Without Saving the Output

So far, we’ve had to save the result in some output file before displaying its content via cat. We can avoid this step by directly printing the result to stdout without generating any file.

Let’s modify the script transpose.py to print out the transposed result instead of saving it to a file:

$ cat transpose.py
#!/usr/bin/env python3
import pandas as pd
import sys
input_file = sys.argv[1]
df = pd.read_csv(input_file, sep=' ')
print(df.T)

The only problem is that the pandas data frame in variable df will introduce integer row indices. These row indices will end up showing in the transposed result as a header. To avoid this, we can pipe the result into sed to remove the header:

$ chmod u+x transpose.py
$ ./transpose.py input | sed '1d'
name  Cal  Tom  Val  Lin
age    21   30   37   45

We pipe the result into sed ‘1d’, which will delete the first line of the result, thus removing the unwanted header.

As before, we can embed the code within a Bash script instead:

$ python3 << EOF | sed '1d'
> import pandas as pd
> df = pd.read_csv('./input', sep=' ')
> print(df.T)
> EOF
name  Cal  Tom  Val  Lin
age    21   30   37   45

It’s worth noting that the correct place for piping the result into sed is on the first line and not the last. The second EOF marker should appear alone on a line and without any preceding space in this case.

Finally, we may write the same commands compactly in a single line:

python3 -c 'import pandas as pd; df=pd.read_csv("./input", sep=" "); print(df.T)' | sed '1d'
name  Cal  Tom  Val  Lin
age    21   30   37   45

Python’s transposed result is neatly spaced-out. We can obtain the same in Bash by piping the result into tr ‘ ‘ ‘\t’, which will convert spaces into tabs.

3. Conclusion

In this article, we’ve presented two approaches for transposing a file from the command line in Linux. The first is using GNU tools which offer a powerful set of commands for text processing, including transposing rows and columns of a file. The second is using Python’s pandas library to load data into a data frame and then transpose it. The Python code can be either a standalone script or embedded within Bash via a here-document.

guest
0 Comments
Inline Feedbacks
View all comments