Sometimes we have a text file that needs reversing. Perhaps it’s a log file where we want to see the most recent entries first. Similarly, we may also wish to reverse the output of a command.
In this tutorial, we’ll look at three different methods for reversing a text file or text stream in Linux. We’ll also compare their advantages and capabilities.
One of the core philosophies of Linux is “everything is a file”. So it’s no surprise that most Linux distributions come pre-equipped to help us.
cat is one of the most popular and well-known ways to output the contents of a file. On the other hand, tac is much a lesser-known command. Its name comes from cat spelled backward, and tac functions as a reverse cat.
Both commands belong to the coreutils package, which comes preinstalled in almost all Linux distributions.
Let’s first look at our test file with cat:
$ cat /tmp/test line_one line_two line_three
Now, let’s look at the same file with tac:
$ tac /tmp/test line_three line_two line_one
As we can see, tac reversed the output.
We can also use tac in pipes to reverse the output of a command:
$ cat /tmp/test | tac line_three line_two line_one
tac is the most straight forward and efficient way of reversing a file. It uses a single-core, and it has a single job that gets done pretty well.
But, it doesn’t offer any configuration options, beyond the line separator it uses, and the number of files it outputs.
3. nl/sort/cut Commands
Like cat and tac, these commands come with the coreutils package, which comes preinstalled in all of the common distributions. The sort, nl, and cut commands need to be chained together to reverse a file.
Let’s build up the chain a piece at a time, so we can understand how each command contributes.
To be able to order the file in reverse order, we need an index for each row. So, we use the nl command to put line numbers at the beginning of each line:
$ nl /tmp/test 1 line_one 2 line_two 3 line_three
Now we want to sort these indexed lines into reverse order, for which we should use sort:
$ nl /tmp/test | sort -nr 3 line_three 2 line_two 1 line_one
By default, sort orders lines lexically and arranges them from smallest to largest. We’re using a couple of parameters to change that here:
- -n – numerical sort
- -r – reverse order
We could also add some other parameters for extra performance:
- –parallel – number of sorts to run at the same time
- –batch – max number of inputs to process at once
- -S – max memory sort can use
The optimal values for these parameters will depend on our system hardware and operating system limits.
As we added a numeric index at the start of the process, we need to remove it to get our lines back to their original form:
$ nl /tmp/test | sort -nr | cut -f 2- line_three line_two line_one
The -f 2- parameter tells cut to print characters which appear after the second whitespace. Here, this means just after the line number generated by the nl command.
sed is a stream editor for filtering and transforming text. It can handle the most complex text of processing tasks.
sed is not a part of coreutils, yet it comes preinstalled in most major Linux distributions. We can also install it ourselves.
To install sed with the apt package manager:
$ sudo apt-get install sed
To install sed with the yum package manager:
$ sudo yum install sed
4.2. sed Script to Reverse a File
To reverse the given text file with sed:
$ sed '1!G;h;$!d' /tmp/test line_three line_two line_one
sed scripts are hard to understand, so let’s unpack this one.
4.3. Subcommands of sed
The one-liner above applies three sed commands to every line. The commands, separated by semicolons, are:
- 1!G – G command appends what is in the hold space to the pattern space, 1! makes sure this command will ignore the first line
- h – copies the pattern space to the hold space
- $!d – delete the line, $! makes sure this command will ignore the last line
For more, check out our in-depth guide on sed and its different spaces.
5. Comparison Between Methods
Let’s compare the methods we’ve learned.
5.1. Daily Performance
For testing, we’ll use a file with 100,000 lines and size of 6.6 megabytes:
$ wc -l test 100000 test $ du -h test 6,6M test
Let’s reverse this file with tac while keeping track of time:
$ time tac test ... ... ... real 0m0,571s user 0m0,004s sys 0m0,086s
tac completes the task in about half a second.
Now, let’s do the same test with nl sort and cut:
$ time nl test | sort -nr | cut -f 2- ... ... ... real 0m1,063s user 0m0,122s sys 0m0,236s
This command takes little more than a second to complete. Now, let’s run it on multiple cores, with 1 GB memory and batches of 1021:
$ time nl test | sort -nr -S 1G --parallel=7 --batch=1021 | cut -f 2- ... ... ... real 0m0,882s user 0m0,130s sys 0m0,194s
Now it takes less than a second. This is not a big improvement.
Now, let’s time the sed command:
$ time sed '1!G;h;$!d' test ... ... ... real 0m54,336s user 0m53,802s sys 0m0,104s
This is by far the worst result. We wouldn’t use this for speed. However, the point of using sed is its advanced text processing capabilities.
As the results show, tac is the clear winner in terms of performance for daily tasks. So, is there any benefit of the sort approach?
5.2. Big Data Performance
sort can use multiple cores at once, and the real power of sort comes in to play when we deal with huge files on powerful workstations.
Let’s work on the following file:
$ du -h megafile 54G megafile $ wc -l megafile 1000000000 megafile
This file has 1,000,000,000 lines and is 54 gigabytes.
$ time tac megafile >> /dev/null real 13m5.686s user 0m59.028s sys 0m47.556s
tac completes the task in 13 minutes 5 seconds.
Let’s try sort with 23 cores and 200 gigabytes of RAM:
$ time nl megafile | sort -S 200G -nr --parallel=23 --batch=1021 | cut -f 2- >> /dev/null real 6m34.510s user 9m47.677s sys 3m1.545s
tac clearly uses fewer resources overall, even though it takes longer.
However, if we need to get things done as quickly as possible on a huge dataset, sort is much better. And, the gap in performance will grow as the files get larger.
In this article, we covered some basic methods for reversing a text file in Linux.
We used default packages to achieve our goal and demonstrated the innate text processing capabilities of the Linux core with tac, sort and sed.
Then, we analyzed the advantages and disadvantages of the different approaches. We learned that tac is the fastest method for daily use, but sort takes the lead when it comes to dealing with huge data on powerful workstations.
sed, on the other hand, provides a huge level of flexibility and could reverse our file while doing other processing on it.