1. Overview

Sorting lines of text is a common task in Linux. In this tutorial, we’ll learn the sort command through examples.

2. Introduction to the sort Command

The sort command can help us to rearrange lines from standard input (stdin) or from a text file. The sort command will write the sorted result to standard output (stdout). It’s available in all Linux distros since it’s part of the GNU coreutils package.

The syntax of using the sort command is straightforward:

sort [OPTION]... [FILE]...

The sort utility will sort lines alphabetically by default:

$ cat cities.txt
New York City
Paris
Beijing
Hamburg
Los Angeles
Amsterdam

$ sort cities.txt
Amsterdam
Beijing
Hamburg
Los Angeles
New York City
Paris

However, if we pass different options, the sort command can do more than that, such as sort lines by number, in reverse order, by field, and so on.

We’ll go through some examples to learn how to sort lines in various ways using the sort command.

3. Sort by Number

Very often, we need to sort lines numerically. We can pass the option -n to sort to do that.

Let’s create a new file, cities2.txt, and add a new column: population in millions. We’ll sort the lines in the new file by population:

$ cat cities2.txt
8.18 New York City
2.15 Paris
21.45 Beijing
1.82 Hamburg
3.90 Los Angeles
1.38 Amsterdam

$ sort -n cities2.txt
1.38 Amsterdam
1.82 Hamburg
2.15 Paris
3.90 Los Angeles
8.18 New York City
21.45 Beijing

Well, the lines are sorted by population in the output above.

sort -n can help us to sort lines by decimal numbers. However, it cannot sort signed binary or hexadecimal numbers correctly.

4. Sort in Reverse Order

To sort files in reverse order, we use the option -r.

Now let’s sort the file cities2.txt by population in descending order:

$ sort -nr cities2.txt 
21.45 Beijing
8.18 New York City
3.90 Los Angeles
2.15 Paris
1.82 Hamburg
1.38 Amsterdam

5. Sort by Month

Sometimes there are months in our text, such as “Nov” or “August”. The sort command supports the convenient -M option to sort lines by month:

$ cat months.txt 
October
January
December
November
August

$ sort -M months.txt
January
August
October
November
December

6. Sort by ASCII Character Code

From time to time, we want to sort lines by the ASCII character code in the text.

Let’s see a text file:

$ cat ascii.txt
C
B
b
c
A
a

If we sort it using the default option of the sort command, we’ll get:

$ sort ascii.txt
a
A
b
B
c
C

The result is sorted alphabetically.

However, it is not in ASCII code order. For example, an upper-case “A” has ASCII code 65, while the ASCII code of a lower-case “a” is 97.

LC_ALL is the environment variable overrides other localization settings. To instead sort lines by ASCII code, we must set the environment variable LC_ALL=C, so that we force sorting to be byte-wise.

Let’s see how this environment variable changes the default behavior of the sort command:

$ LC_ALL=C sort ascii.txt
A
B
C
a
b
c

In the command above, we set the LC_ALL=C temporarily only for the sort command execution. It won’t change the LC_ALL value in the current shell.

7. Write the Sorted Output to a File

By default, the sort command writes the result to stdout. Sometimes we want to save the output in a file. We can pass the -o FILE option to the sort command to write the result in a file instead of stdout:

$ sort -o ascii_result.txt ascii.txt
$ cat ascii_result.txt
a
A
b
B
c
C

In addition to using the -o option, we can also redirect stdout to our output file:

$ sort ascii.txt > ascii_result.txt

However, if we want to write the sorted result back into the input file, we need to do it via a temp file:

$ sort ascii.txt > sorted.tmp && mv sorted.tmp ascii.txt
$ cat ascii.txt
a
A
b
B
c
C

8. Sort and Remove Duplicates

If we pass the -u option to the sort command, it will generate a “unique” result, outputting sorted lines and removing duplicates:

$ cat dup.txt 
New York City
Paris
Beijing
Paris
New York City
Hamburg
New York City
Hamburg

$ sort -u dup.txt 
Beijing
Hamburg
New York City
Paris

9. Sort by Keys

So far, we’ve always sorted by the items at the beginning of the lines. We can also sort lines by keys. To do that, we pass the -k option to the sort command. 

It’s pretty handy if we need to sort some field-based data, such as CSV files.

Let’s learn it through a working hours report in a CSV file (Name, Month, Working Hours, Comments):

$ cat working_hours.csv
Dr.Schmidt,Jan,123,some comments...
Mr.Green,Feb,20,some comments...
Dr.Schmidt,Mar,25,some comments...
Mr.Adams,Jan,77,some comments...
Mr.Green,Jan,45,some comments...
Mr.Adams,Feb,150,some comments...
Mr.Adams,Mar,80,some comments...
Mr.Green,Mar,107,some comments...
Dr.Schmidt,Feb,87,some comments...

Next, we’ll see how to sort the CSV file by fields and a part of a field.

9.1. Sort by a Field

Let’s say, we want to sort the lines by the 3rd field: Working Hours.

We’ll have a look at the sort command to solve the problem first:

$ sort -t, -k 3n,3 working_hours.csv  
Ms.Green,Feb,20,some comments...
Dr.Schmidt,Mar,25,some comments...
Ms.Green,Jan,45,some comments...
Mr.Adams,Jan,77,some comments...
Mr.Adams,Mar,80,some comments...
Dr.Schmidt,Feb,87,some comments...
Ms.Green,Mar,107,some comments...
Dr.Schmidt,Jan,123,some comments...
Mr.Adams,Feb,150,some comments...

Now, let’s understand each part of the command.

The default field separator for the sort command is whitespace. We can also define a custom field separator using option -t. Since fields in our CSV file are separated by commas, we passed “-t,”.

Next, we defined a sorting key, 3n,3 for the -k option. The definition of a key in the sort command is:

POS1[sorting options],POS2

The POS1 indicates the starting key position, while the POS2 is the ending key position. If we don’t give a POS2, the end of the line will be taken as the POS2. 

Our goal is to sort by the 3rd field with the -n (numerically) option, therefore, we have -k 3n,3.

9.2. Sort by a Part of a Field

Sorting by fields can help us a lot in sorting field-based data. However, sometimes we want a part of a field to be the sorting key.

Now let’s extend the requirement in the previous section: we would like to first sort our working_hours.csv by persons’ names then sort by working hours.

Sorting by working hours is not a problem for us, but we notice to sort by person’s names, we need to exclude the titles (Ms. Mr. Dr.) from the 1st field.

Let’s take a look at the solution first and then understand how to sort by a part of a field:

$ sort -t, -k 1.4,1 -k 3n,3 working_hours.csv  
Mr.Adams,Jan,77,some comments...
Mr.Adams,Mar,80,some comments...
Mr.Adams,Feb,150,some comments...
Ms.Green,Feb,20,some comments...
Ms.Green,Jan,45,some comments...
Ms.Green,Mar,107,some comments...
Dr.Schmidt,Mar,25,some comments...
Dr.Schmidt,Feb,87,some comments...
Dr.Schmidt,Jan,123,some comments...

The sorting key 1.4,1 did the trick. Let’s understand its meaning.

We’ve learned that a sorting key is defined as POS1, POS2. Moreover, each POS is defined as F.C, so a complete sorting key definition is:

F1.C1[sorting options],F2.C2

The F1 and F2 stand for the field indexes. In this case, they are 1 for the 1st field.

C1 is the character index within field F1 to begin the sort comparison. If we don’t define a C1, the comparison starts from the 1st character of the field F1.

C2 is the character index within field F2 to end the sort comparison. If we omit C2, the sorting comparison ends at the last character of the field F2.

In our example, to exclude the titles from the Name field, the sorting comparison should start from the 4th character. Therefore, we have “-k 1.4,1″.

10. Conclusion

sort is a useful and straightforward command-line utility. In this article, we’ve learned some typical usage of the sort command by examples.

Sorting by keys using the -k option is not as straightforward as other sorting options. But it allows us to sort field-based data more flexibly.

With this handy tool in our command line arsenal, we can solve most sorting problems easily.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments