1. Overview

There are situations when we have several files of the same content on our Linux Machine. Sometimes, it’s tricky to find all such files.

In this tutorial, we’ll learn how to compare all files in a directory using the Linux CLI.

Firstly, we’ll create the sample files to test upon. Then, we’ll use the diff command with a for-loop. Finally, we’ll learn the find command with the awk program.

2. Initial Setup

Before we learn the commands, let’s list four sample files using the ls command:

$ ls
file_1  file_2  file_3  file_4

Now, let’s look at each file’s content using the cat command:

$ cat file_1
hello world
$ cat file_2
hello
$ cat file_3
hello world
$ cat file_4
world

We can see that the files file_1 and file_3 have identical content string “hello world” while the rest of the files differ from one another.

3. Using the diff Command

Now that we have the sample files, let’s compare them using the diff command.

Typically, diff is used to compare two files. However, since we need to compare all of them, we’ll iterate over all files in the directory using the forloop:

$ for i in ./*; do diff -q "$i" file_1; done
Files ./file_2 and file_1 differ
Files ./file_4 and file_1 differ

As we can see, the command has shown which files differ when compared to file_1. In our case, the file_2 and the file_4 differ from file_1.

Let’s have a closer look at the above command. It consists of the following parts:

  • for i in ./* is to iterate over all files  in the current directory using the iterator variable i
  • do diff -q “$i” file_1 means to compare all the files with file_1. The option -q is to give only a brief note when the files are different
  • done is the keyword to finish the for-loop

Note that each part of the command is separated by a semicolon (;), which is standard syntax for one-liner for-loops.

4. Using the find Command With AWK

Although the diff command does the job, sometimes we want to compare all files to one another, not just to the file_1.

For this situation, we can use the find command with the checksum option, which is then piped to the awk program:

$ find . -type f -exec cksum {} + | awk '{ ck[$1$2] = ck[$1$2] ? ck[$1$2] OFS $3 : $3 } END { for (i in ck) print ck[i] }'
./file_4
./file_2
./file_1 ./file_3

The output shows sets of the same files. For example, the file_1 and the file_3 are grouped while the file_2 and the file_4 are placed separately.

Let’s now have a closer look at the command:

  • find . -type f means to find all files in the current directory
  • -exec cksum {} + is to calculate the checksum of each found file
  • awk ‘{ ck[$1$2] = ck[$1$2] ? ck[$1$2] OFS $3 : $3 } compares checksums and sizes of each file and keeps track of the exact matches
  • END { for (i in ck) print ck[i] } loops through the tracked matches and prints them line by line

Overall, the combination of the find command with the awk language allows us to compare and group all the same files within a directory.

5. Conclusion

In this tutorial, we’ve learned how to diff all files in a directory.

Firstly, we created the sample files. Then, we used the diff command to compare all files to a single file in the directory. Finally, we looked at the find command with AWK to find and group all files with the same content.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments