Sorting lines of text is a common task in Linux. In this tutorial, we’ll learn the sort command through examples.
2. Introduction to the sort Command
The sort command can help us to rearrange lines from standard input (stdin) or from a text file. The sort command will write the sorted result to standard output (stdout). It’s available in all Linux distros since it’s part of the GNU coreutils package.
The syntax of using the sort command is straightforward:
sort [OPTION]... [FILE]...
The sort utility will sort lines alphabetically by default:
$ cat cities.txt New York City Paris Beijing Hamburg Los Angeles Amsterdam $ sort cities.txt Amsterdam Beijing Hamburg Los Angeles New York City Paris
However, if we pass different options, the sort command can do more than that, such as sort lines by number, in reverse order, by field, and so on.
We’ll go through some examples to learn how to sort lines in various ways using the sort command.
3. Sort by Number
Very often, we need to sort lines numerically. We can pass the option -n to sort to do that.
Let’s create a new file, cities2.txt, and add a new column: population in millions. We’ll sort the lines in the new file by population:
$ cat cities2.txt 8.18 New York City 2.15 Paris 21.45 Beijing 1.82 Hamburg 3.90 Los Angeles 1.38 Amsterdam $ sort -n cities2.txt 1.38 Amsterdam 1.82 Hamburg 2.15 Paris 3.90 Los Angeles 8.18 New York City 21.45 Beijing
Well, the lines are sorted by population in the output above.
sort -n can help us to sort lines by decimal numbers. However, it cannot sort signed binary or hexadecimal numbers correctly.
4. Sort in Reverse Order
To sort files in reverse order, we use the option -r.
Now let’s sort the file cities2.txt by population in descending order:
$ sort -nr cities2.txt 21.45 Beijing 8.18 New York City 3.90 Los Angeles 2.15 Paris 1.82 Hamburg 1.38 Amsterdam
5. Sort by Month
Sometimes there are months in our text, such as “Nov” or “August”. The sort command supports the convenient -M option to sort lines by month:
$ cat months.txt October January December November August $ sort -M months.txt January August October November December
6. Sort by ASCII Character Code
From time to time, we want to sort lines by the ASCII character code in the text.
Let’s see a text file:
$ cat ascii.txt C B b c A a
If we sort it using the default option of the sort command, we’ll get:
$ sort ascii.txt a A b B c C
The result is sorted alphabetically.
However, it is not in ASCII code order. For example, an upper-case “A” has ASCII code 65, while the ASCII code of a lower-case “a” is 97.
LC_ALL is the environment variable overrides other localization settings. To instead sort lines by ASCII code, we must set the environment variable LC_ALL=C, so that we force sorting to be byte-wise.
Let’s see how this environment variable changes the default behavior of the sort command:
$ LC_ALL=C sort ascii.txt A B C a b c
In the command above, we set the LC_ALL=C temporarily only for the sort command execution. It won’t change the LC_ALL value in the current shell.
7. Write the Sorted Output to a File
By default, the sort command writes the result to stdout. Sometimes we want to save the output in a file. We can pass the -o FILE option to the sort command to write the result in a file instead of stdout:
$ sort -o ascii_result.txt ascii.txt $ cat ascii_result.txt a A b B c C
In addition to using the -o option, we can also redirect stdout to our output file:
$ sort ascii.txt > ascii_result.txt
However, if we want to write the sorted result back into the input file, we need to do it via a temp file:
$ sort ascii.txt > sorted.tmp && mv sorted.tmp ascii.txt $ cat ascii.txt a A b B c C
8. Sort and Remove Duplicates
If we pass the -u option to the sort command, it will generate a “unique” result, outputting sorted lines and removing duplicates:
$ cat dup.txt New York City Paris Beijing Paris New York City Hamburg New York City Hamburg $ sort -u dup.txt Beijing Hamburg New York City Paris
9. Sort by Keys
So far, we’ve always sorted by the items at the beginning of the lines. We can also sort lines by keys. To do that, we pass the -k option to the sort command.
It’s pretty handy if we need to sort some field-based data, such as CSV files.
Let’s learn it through a working hours report in a CSV file (Name, Month, Working Hours, Comments):
$ cat working_hours.csv Dr.Schmidt,Jan,123,some comments... Mr.Green,Feb,20,some comments... Dr.Schmidt,Mar,25,some comments... Mr.Adams,Jan,77,some comments... Mr.Green,Jan,45,some comments... Mr.Adams,Feb,150,some comments... Mr.Adams,Mar,80,some comments... Mr.Green,Mar,107,some comments... Dr.Schmidt,Feb,87,some comments...
Next, we’ll see how to sort the CSV file by fields and a part of a field.
9.1. Sort by a Field
Let’s say, we want to sort the lines by the 3rd field: Working Hours.
We’ll have a look at the sort command to solve the problem first:
$ sort -t, -k 3n,3 working_hours.csv Ms.Green,Feb,20,some comments... Dr.Schmidt,Mar,25,some comments... Ms.Green,Jan,45,some comments... Mr.Adams,Jan,77,some comments... Mr.Adams,Mar,80,some comments... Dr.Schmidt,Feb,87,some comments... Ms.Green,Mar,107,some comments... Dr.Schmidt,Jan,123,some comments... Mr.Adams,Feb,150,some comments...
Now, let’s understand each part of the command.
The default field separator for the sort command is whitespace. We can also define a custom field separator using option -t. Since fields in our CSV file are separated by commas, we passed “-t,”.
Next, we defined a sorting key, 3n,3 for the -k option. The definition of a key in the sort command is:
The POS1 indicates the starting key position, while the POS2 is the ending key position. If we don’t give a POS2, the end of the line will be taken as the POS2.
Our goal is to sort by the 3rd field with the -n (numerically) option, therefore, we have -k 3n,3.
9.2. Sort by a Part of a Field
Sorting by fields can help us a lot in sorting field-based data. However, sometimes we want a part of a field to be the sorting key.
Now let’s extend the requirement in the previous section: we would like to first sort our working_hours.csv by persons’ names then sort by working hours.
Sorting by working hours is not a problem for us, but we notice to sort by person’s names, we need to exclude the titles (Ms. Mr. Dr.) from the 1st field.
Let’s take a look at the solution first and then understand how to sort by a part of a field:
$ sort -t, -k 1.4,1 -k 3n,3 working_hours.csv Mr.Adams,Jan,77,some comments... Mr.Adams,Mar,80,some comments... Mr.Adams,Feb,150,some comments... Ms.Green,Feb,20,some comments... Ms.Green,Jan,45,some comments... Ms.Green,Mar,107,some comments... Dr.Schmidt,Mar,25,some comments... Dr.Schmidt,Feb,87,some comments... Dr.Schmidt,Jan,123,some comments...
The sorting key 1.4,1 did the trick. Let’s understand its meaning.
We’ve learned that a sorting key is defined as POS1, POS2. Moreover, each POS is defined as F.C, so a complete sorting key definition is:
The F1 and F2 stand for the field indexes. In this case, they are 1 for the 1st field.
C1 is the character index within field F1 to begin the sort comparison. If we don’t define a C1, the comparison starts from the 1st character of the field F1.
C2 is the character index within field F2 to end the sort comparison. If we omit C2, the sorting comparison ends at the last character of the field F2.
In our example, to exclude the titles from the Name field, the sorting comparison should start from the 4th character. Therefore, we have “-k 1.4,1″.
10. Sort a TSV File
So far, we’ve learned how to sort a file by field and we have been using CSV input files as examples.
In practice, TSV (Tab Separated Values) is another commonly used data format. In this section, let’s sort a TSV file and review the sort by field technique.
Let’s say some famous movie actors come together for a weightlifting game, and the match result (Name, Bodyweight <KG>, Score <KG>) is recorded in a TSV file:
$ cat match_result.tsv Brad Pitt 78.50 150.00 Michael Caine 77.60 149.50 Tom Hanks 79.00 148.00 Cary Grant 78.80 149.50 Spike Lee 80.00 149.50 Vin Diesel 77.89 150.00 David Tennant 79.50 147.50 Jackie Chan 78.77 151.00 Will Smith 80.50 148.00
Now, our task is to calculate their ranks for the match. According to the weightlifting rule:
- Two players have different scores: the player with the higher score wins.
- Two players have the same score: the player with lower body weight wins.
Therefore, we need to first sort by the third field descending and then sort by the second field ascending.
The difficult part of this problem is sorting by two fields, but it isn’t a challenge for us now. We can build the command to sort by fields: sort -k3nr,3 -k2n,2 match_result.tsv.
We still need to specify Tab as the field separator using the -t option.
Let’s give it a try:
$ sort -t'\t' -k3nr,3 -k2n,2 match_result.tsv sort: multi-character tab ‘\\t’
Oops, sort treats ‘\t‘ as multi-character! Next, let’s see how to pass Tab as the field separator correctly.
10.1. Passing Tab as the Field Separator
We have two ways to pass Tab as the field separator to the sort command:
- Passing a literal Tab
- Escaping the Tab character
Usually, when we type the TAB key in the command line, it will trigger command completion instead of showing a literal Tab in the command.
However, we can add a literal Tab in the command line by first typing CTRL-V then TAB:
$ sort -t' ' -k3nr,3 -k2n,2 match_result.tsv Jackie Chan 78.77 151.00 Vin Diesel 77.89 150.00 Brad Pitt 78.50 150.00 Michael Caine 77.6 149.50 Cary Grant 78.80 149.50 Spike Lee 80.00 149.50 Tom Hanks 79.00 148.00 Will Smith 80.50 148.00 David Tennant 79.50 147.50
We should note that in the command above, it’s -t ‘<TAB>’.
Another way to pass Tab to the -t option is to escape the Tab using ANSI-C Quoting:
$ sort -t$'\t' -k3nr,3 -k2n,2 match_result.tsv Jackie Chan 78.77 151.00 Vin Diesel 77.89 150.00 Brad Pitt 78.50 150.00 Michael Caine 77.60 149.50 Cary Grant 78.80 149.50 Spike Lee 80.00 149.50 Tom Hanks 79.00 148.00 Will Smith 80.50 148.00 David Tennant 79.50 147.50
Great! We’ve solved the problem. Jackie Chan won the first prize!
sort is a useful and straightforward command-line utility. In this article, we’ve learned some typical usage of the sort command by examples.
Sorting by keys using the -k option is not as straightforward as other sorting options. But it allows us to sort field-based data more flexibly.
We should note that when we sort TSV files, we should pass Tab as field separator correctly.
With this handy tool in our command line arsenal, we can solve most sorting problems easily.