1. Overview

In this article, we’re going to see how to count the number of lines in the files contained in one directory as well as multiples directories and subdirectories. In addition, we’ll discuss the case of multiples types of files, and we’ll see how to count millions of lines of code.

By way of illustration, we’ve taken projects like wget, WordPress, HAproxy, and the Linux Kernel to demonstrate different use cases. Also, we’re going to take a look at the wc, find and xargs commands. We’ll also pay special attention to how we can combine them to deal with complex files organization.

Furthermore, we’ll see how to use the tools in the correct way to obtain an accurate result.

2. Counting Lines in a File

First, notice that without providing any argument, the wc command counts and prints the number of lines, words, and bytes of each given file passed as a parameter:

$ wc /usr/include/stdio.h
870  4202 29660 /usr/include/stdio.h

Secondly, to count and display only the number of lines, we use the -l option or longer –lines option:

$ wc -l /usr/include/stdio.h
870 /usr/include/stdio.h

3. Counting Lines in Files Inside a Directory

To illustrate how to count the number of lines inside multiples files contained in one directory, we’ll take the source code of the wget tool. It’s a network downloader utility available under most GNU/Linux distributions. All the source code files are in the src directory, including the header files.

First, let’s start by counting only C language files *.c:

$ wc -l wget-1.21.1/src/*.c 
   101 src/build_info.c
   1084 src/connect.c
   1230 src/convert.c
   ...
   95 src/xattr.c
  54300 total

Secondly, we can use the tail -1 command to select the last line of the wc output. Because as we’ve seen the wc command prints the number of lines of each .c file:

$ wc -l wget-1.21.1/src/*.c | tail -1
54300 total

In addition to C language files, we’re going to count the lines of headers files (*.h). To count both types of files, we do:

$ wc -l wget-1.21.1/src/*.[ch] | tail -1
57241 total

4. Counting Lines of Files in Multiple Directories

To learn how to count the number of lines inside multiples files contained in different directories, we’ve chosen the HAproxy project. Again, its code, C language files *.c can be found inside the src directory. The headers files *.h are split into two sub-directories, include/import and the include/haproxy directory.

So to count the number of lines of the HAproxy source code, we can do:

$ cd haproxy
$ find src/ include/ -name '*.[ch]' | xargs wc -l | tail -1
256475 total

First,  the find command fetches all C language files and header files in the src and include directories, respectively.

Secondly, all files are passed one by one to wc command via xargs. So the wc command will perform the count of the number of lines for each file.

Finally, we select the last line of the output via the tail command. As we can see, it represents the total number of lines for all files.

A common error is to use wc with find without the xargs command. In fact, this will count the number of files which is totally wrong:

$ find src/ include/ -name '*.[ch]' | wc -l |tail -1
403

Here 403 is the number of files found not the number of lines inside those files.

4.1. Counting Lines of Files of Multiple Languages in Multiple Directories

In addition to counting the lines inside files in multi-directories, we typically also need to deal with multi-languages situations. To demonstrate this, we can use the WordPress project which has files in PHP, JavaScript, and CSS.

In a similar manner, we can add the  -o option to the find command (o for logical Or) because we’ve multiples types of files.

Let’s count the lines of PHP, JavaScript, and CSS files:

$ find  WordPress-5.7.2/ -type f -name '*.php' -o -name '*.js' -o -name '*.css' | xargs wc -l | tail -1
1248529 total

Another way to perform that is by using a command substitution operator. Bash performs the expansion by executing  the find command in a subshell and replacing the command substitution with the output of find:

$ wc -l $(find . -type f -name '*.php' -o -name '*.js' -o -name '*.css') | tail -1
1248529 total

5. Countings Millions of Lines of Files

So far, the methods we’ve used are very useful for small and middle-sized projects, but they cannot give us the correct answer if we use them for a big project.

As an example, the Linux kernel has many directories, subdirectories, and thousand of files. In such a case, the find command splits the list of files into pieces and makes the wc command print a total for each sublist rather than a total for the entire list.

Furthermore, if we try to pass the output of the find command to the wc command as a parameter, the list of files names will be too long. Hence, we’ll exceed the line length limitation of the wc command, and it will give us an error:

$ wc -l $(find linux/ -type f -name '*.[ch]')
bash: /usr/bin/wc: Argument list too long

In this situation, the correct way to perform this operation is to use -exec option of the find command. So, it will run the wc command independently for each file. Since it will not figure out the total of all lines of files, we use the awk command to calculate the amount of all columns.

$ find linux/ -type f -name '*.[ch]' -exec wc -l {} \; | awk '{ total += $1 } END {print total}'
28085235

As we can see, the Linux Kernel has more than 28 million lines of code!

6. Conclusion

In this article, we’ve seen how to use Linux tools in a simple manner to efficiently count the number of lines inside files for projects organized in different ways. We’ve seen examples using one directory, multi directories, multi-type, and millions of lines.

Of course, we can learn more about the wc, find and xargs tools by reading their manuals pages.

Comments are closed on this article!