1. Overview

Locating files of a specific type can be useful to users and system administrators. Still, relying solely on the file extension to identify files can lead to incorrect results when extensions are erroneous or don’t reflect the file type. Moreover, in Linux, file extensions aren’t obligatory, and files can exist and be read without having one.

In this tutorial, we’ll explore how to find files of a specific type in Linux, regardless of whether the files have an extension or not.

2. Sample Task

Let’s suppose there are three files located in our current directory. We can list them using ls:

$ ls
file1.pdf  file2.png  file3

We see that the files are named file1.pdf, file2.png, and file3. Based on the file extension, file1.pdf seems to be a PDF document and file2.png looks like a PNG image. However, the type of file3 isn’t immediately obvious since it lacks an extension altogether.

Our objective is to find all files in the directory that are PNG images, regardless of the file extension.

Thus, let’s first explore the common approach to finding files by extension and then modify it to accomplish our stated objective.

3. Finding Files by Extension

A common way to locate files based on their extension is by using the find command:

$ find . -maxdepth 1 -type f -name '*.png'
./file2.png

In this case, we locate all regular files in the current directory having names that end with the .png extension.

The -maxdepth flag specifies the maximum level of directories to descend into when searching recursively. Although it isn’t needed since there’s only one level to explore and no subdirectories, we specify the flag for clarity and completeness. Next, the -type flag indicates the type of files to look for, which are regular files in this case, as indicated by the f letter. Finally, the -name flag specifies a pattern that the filename must match.

One problem with this approach is that the file extension might not reflect the file type. Another issue is that files lacking an extension, such as file3, are excluded from the result, when in fact they might be PNG files. Therefore, finding files based on their extension isn’t always a reliable approach.

4. Finding Files by MIME Type

Since filename extensions aren’t necessary in a Linux environment, Linux can often identify file types based on the metadata and the content of files, not solely based on file extension.

To do this, we can compare the file’s metadata against a magic database containing patterns and signatures that can identify various file types. On Debian, a compiled version of the database is located at /usr/lib/file/magic.mgc. The magic database is used by the file command in Linux to identify file types.

4.1. Multipurpose Internet Mail Extensions (MIME)

We can obtain the Multipurpose Internet Mail Extensions (MIME) type of a file using the file command in Linux. The MIME type indicates the identifying format of a file in a clear and standardized way.

MIME types are widely used on the Internet and in email attachments of many types:

  • application/pdf for PDF files
  • image/png for PNG files
  • text/plain for text files

These are just some examples of MIME types.

4.2. Detecting the MIME Type

To detect the MIME type of a file, we can use the –mime-type flag with the file command:

$ file --mime-type file1.pdf
file1.pdf: application/pdf

In this case, we see that file1.pdf is indeed a PDF file.

Next, we can use the file command as part of find to identify the MIME type of all regular files in the current directory:

$ find . -maxdepth 1 -type f -exec file --mime-type {} +
./file1.pdf: application/pdf
./file2.png: image/png
./file3:     image/png

We use the -exec flag to specify a command to execute for the found files. The command, in this case, is the file –mime-type command. The plus (+) sign at the end is to enable listing all the found files as multiple arguments to the file command. This way, we call the file command only once instead of multiple times.

Consequently, we see that file1.pdf is a PDF file, while file2.png and file3 are PNG files.

4.3. Filtering by MIME Type

Now that we know how to obtain the MIME type of files, we can use that information to filter filenames by the file type we’re interested in. To that end, we can use grep to exclusively select files of a particular MIME type:

$ find . -maxdepth 1 -type f -exec file --mime-type {} + | grep 'image/png$'
./file2.png: image/png
./file3:     image/png

This way, we pipe the result to grep and select only the lines that end with the image/png pattern.

If we want to return only the filenames, we can further pipe the result to the cut command:

$ find . -maxdepth 1 -type f -exec file --mime-type {} + | grep 'image/png$' | cut -d ':' -f 1
./file2.png
./file3

When it comes to cut, the -d option specifies the delimiter, while the -f flag is for specifying the field we want to extract based on its number. In this case, we extract the first field, which corresponds to the filename preceding the colon delimiter in each line.

5. MIME Type vs. File Extension

As previously mentioned, finding files solely based on their extension might not be a reliable approach. To illustrate this, we can append a wrong extension to file3, indicating it’s a PDF file:

$ mv file3 file3.pdf

Here, we’ve used the mv command to change the filename from file3 to file3.pdf.

However, this shouldn’t pose a problem if we apply our previous approach to filter by MIME type:

$ find . -maxdepth 1 -type f -exec file --mime-type {} + | grep 'image/png$' | cut -d ':' -f 1
./file3.pdf
./file2.png

Critically, file3.pdf is a PNG file despite its .pdf extension.

6. Conclusion

In this article, we explored how to find files based on a specific file type.

Our technique relies on the file command to identify the MIME type of files. This makes the approach much more reliable when compared to finding files by extension, especially when file extensions are erroneous or missing.

Comments are closed on this article!