Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

There are situations when we have several files of the same content on our Linux Machine. Sometimes, it’s tricky to find all such files.

In this tutorial, we’ll learn how to compare all files in a directory using the Linux CLI.

Firstly, we’ll create the sample files to test upon. Then, we’ll use the diff command with a for-loop. Finally, we’ll learn the find command with the awk program.

2. Initial Setup

Before we learn the commands, let’s list four sample files using the ls command:

$ ls
file_1  file_2  file_3  file_4

Now, let’s look at each file’s content using the cat command:

$ cat file_1
hello world
$ cat file_2
hello
$ cat file_3
hello world
$ cat file_4
world

We can see that the files file_1 and file_3 have identical content string “hello world” while the rest of the files differ from one another.

3. Using the diff Command

Now that we have the sample files, let’s compare them using the diff command.

Typically, diff is used to compare two files. However, since we need to compare all of them, we’ll iterate over all files in the directory using the forloop:

$ for i in ./*; do diff -q "$i" file_1; done
Files ./file_2 and file_1 differ
Files ./file_4 and file_1 differ

As we can see, the command has shown which files differ when compared to file_1. In our case, the file_2 and the file_4 differ from file_1.

Let’s have a closer look at the above command. It consists of the following parts:

  • for i in ./* is to iterate over all files  in the current directory using the iterator variable i
  • do diff -q “$i” file_1 means to compare all the files with file_1. The option -q is to give only a brief note when the files are different
  • done is the keyword to finish the for-loop

Note that each part of the command is separated by a semicolon (;), which is standard syntax for one-liner for-loops.

4. Using the find Command With AWK

Although the diff command does the job, sometimes we want to compare all files to one another, not just to the file_1.

For this situation, we can use the find command with the checksum option, which is then piped to the awk program:

$ find . -type f -exec cksum {} + | awk '{ ck[$1$2] = ck[$1$2] ? ck[$1$2] OFS $3 : $3 } END { for (i in ck) print ck[i] }'
./file_4
./file_2
./file_1 ./file_3

The output shows sets of the same files. For example, the file_1 and the file_3 are grouped while the file_2 and the file_4 are placed separately.

Let’s now have a closer look at the command:

  • find . -type f means to find all files in the current directory
  • -exec cksum {} + is to calculate the checksum of each found file
  • awk ‘{ ck[$1$2] = ck[$1$2] ? ck[$1$2] OFS $3 : $3 } compares checksums and sizes of each file and keeps track of the exact matches
  • END { for (i in ck) print ck[i] } loops through the tracked matches and prints them line by line

Overall, the combination of the find command with the awk language allows us to compare and group all the same files within a directory.

5. Conclusion

In this tutorial, we’ve learned how to diff all files in a directory.

Firstly, we created the sample files. Then, we used the diff command to compare all files to a single file in the directory. Finally, we looked at the find command with AWK to find and group all files with the same content.