1. Introduction

From time to time, we may need to count the number of files in each directory in a Linux system.

While there’s no single command to accomplish this, we can find solutions by combining a few basic commands that are available per default on most Linux distributions.

2. The Problem

For this example specifically, let’s look at a directory containing three subdirectories:

$ ls
Assignments Conference Projects

Each of these subdirectories contains a number of files, each one designated by a number:

$ ls *
Assignments:
1  2  3  4  5

Conference:
1  2

Projects:
1  2  3  4  5  6  7

As we can see, there are five files in the assignments folder, two files in the conference folder, and seven files in the projects folder.

Thus, we’d expect our output to look something like:

      5 Assignments
      2 Conference
      7 Projects

With this in mind, let’s explore which commands we’ll need to accomplish this in the following section.

3. Linux Command Descriptions

There are four commands that we’re going to use during this tutorial. Firstly, let’s look at each one separately and find out what they do. In the end, we’ll then finally combine these commands to solve the problem.

Generally speaking, all of these commands should already be available on most Linux systems because they’re commonly used for a multitude of purposes.

To combine these commands, we use the pipe operator (|) for the purpose of redirecting the output from the first command to the input of the second command. This allows us to string together several commands in one single line.

3.1. find

In short, the find command allows us to search for files in our file system. Furthermore, these can be filtered in many different ways, depending on our goals.

For the purpose of this tutorial, we want to limit the results to only regular files. This is in contrast to directories, symbolic links, or sockets.

Let’s invoke find to accomplish our first goal:

$ find . -type f
./Assignments/2
./Assignments/3
./Assignments/5
./Assignments/1
./Assignments/4
./Conference/2
./Conference/1
./Projects/7
./Projects/2
./Projects/3
./Projects/5
./Projects/1
./Projects/4
./Projects/6

As shown above, this lists all the files with their relative paths, starting from the directory where invoked the command.

3.2. cut

Because we only care about the number of files and not about the individual file names, we want to cut out only the directory names next. This will then later allow us to count the number of occurrences of each directory name.

To achieve this, we consequently use the cut command, which allows us to remove certain sections from lines of text.

As an illustration, this command works similarly to cutting out a certain part from a piece of paper.

Firstly, we need to specify where we want to cut. This is achieved using the -d (delimiter) argument to specify the character that we want to act as the marker of where to separate the line into multiple fields. In our case, this should be the forward-slash (/) because it acts as the delimiter in Linux directory paths.

Secondly, we need to select which one of those fields we want to keep. We accomplish this by using the -f (field) argument to select field number two, which is the directory name.

Given these points, we can pipe the two commands together now:

$ find . -type f | cut -d / -f 2
Assignments
Assignments
Assignments
Assignments
Assignments
Conference
Conference
Projects
Projects
Projects
Projects
Projects
Projects
Projects

3.3. sort

While it is already the case here, we cannot expect the output of the previous commands to be in alphabetical order every time.

We can easily solve this by calling the sort command:

$ find . -type f | cut -d / -f 2 | sort
Assignments
Assignments
Assignments
Assignments
Assignments
Conference
Conference
Projects
Projects
Projects
Projects
Projects
Projects
Projects

As mentioned above, in our specific example, this does not change the output at all but should be incorporated to guarantee the expected behavior every time.

3.4. uniq

Lastly, we use the uniq command to collapse this list down to just one occurrence of every directory name.

Moreover, the -c argument counts the number of times each directory name came up in our search, which is exactly the result we were looking for from the beginning:

$ find . -type f | cut -d / -f 2 | sort | uniq -c
      5 Assignments
      2 Conference
      7 Projects

4. Using the awk Command

We’ve seen a straightforward approach combining four commands to solve the problem.

However, we have to spend four processes, and the result of the find command will be processed three times. This may lead to poor performance if our target directory contains a large number of sub-directories and files.

Next, we’ll address another solution using the awk command:

$ find . -type f | awk -F '/' '{dir[$2]++} END{for(i in dir)print dir[i], i}' 
2 Conference
7 Projects
5 Assignments

We’ve known what the find command does. Now, let’s understand how the short awk one-liner works:

  • -F ‘/’: We choose the slash ( / ) as the FS to easier extract the directory name
  • { dir[$2]++ }: We create an associative array to record each directory and the number of its occurrences. The KEY here is the extracted directory name in the input file
  • END{ for(i in dir) print dir[i], i }: After we pass through all input lines, which is the output of the find command, we print out all elements of the array in the END block

The output of the find command is processed only once by the awk command. There is no expensive soring operation performed, either. Therefore, the awk solution will have better performance than the approach using find, cut, sort, and uniq.

5. Conclusion

By combining the four Linux commands find, cut, sort, and uniq, we were indeed able to achieve the desired result of counting the number of files in each directory on our Linux system with a single line of code.

If we find ourselves needing to use a more complex function like this regularly, we should consider creating an alias for it so that it’s available to us in an easy and convenient way.

Moreover, we’ve also seen another way to solve the problem using the find and awk commands. Since this approach will have better performance, we should consider using this method if our target directory contains a large number of files.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments