Generic Top

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 tomorrow:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we'll do a quick comparison of the Linux commands sort | uniq and sort -u. Both use sort to remove duplicate entries from a list, but they operate in slightly different manners.

Note that all commands below are platform-independent.

2. Basic Usage

Let's start with a list of colors in a file named color:

% cat color
Black
green
red
red
yellow
Green
red

If we want to remove duplicates, uniq would work in some cases. Checking the man page for uniq:

Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first.

For our list, the result would not be a list of unique entries because our list has duplicated, non-adjacent entries of “red”:

% uniq color
Black
green
red
yellow
Green
red

There are a couple of ways around this. First, using the -u argument with uniq removes all duplicates, both adjacent and non-adjacent:

% uniq -u color
Black
green
yellow
Green
red

Alternatively, taking the man page suggestion, sorting the list before calling uniq will remove all of the duplicates.

Sorting the list is easy:

% sort color
Black
Green
green
red
red
red
yellow

Piping this to uniq yields:

% sort color | uniq
Black
Green
green
red
yellow

Now, checking the man page for sort, we can see that the -u flag will provide the same output:

% sort -u color
Black
Green
green
red
yellow

So, generally speaking, both sort | uniq and sort -u do the same thing. But there are some differences.

For example, sort has other options, like sorting on delimiters. But we can use these regardless of using -u or piping to uniq. 

3. Counting Unique Entries

After finding a unique list of items, many times we'll also want to know the number of unique items. The -c option for uniq will return a count for each duplicated line:

% uniq -c color
   1 Black
   1 green
   2 red
   1 yellow
   1 Green
   1 red

Kind of useful, but it again hits the issue of ignoring non-adjacent duplicates. To avoid that, we could sort the list first, then pipe the output to uniq:

sort color | uniq -c
   1 Black
   1 Green
   1 green
   3 red
   1 yellow

Now we have a list of unique entries regardless of adjacency.

Taking it a step further, let's say we want a count of unique items in the list. We can pipe to wc:

% sort color | uniq | wc -l
       5

Or with sort -u instead of uniq:

% sort -u color | wc -l
       5

And we get a count of our unique list items.

4. Summary

In this short article, we described the differences between using sort | uniq and sort -u.

Generic bottom

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 tomorrow:

>> CHECK OUT THE COURSE

Leave a Reply

avatar
  Subscribe  
Notify of