A file extension is a suffix at the end of a filename indicating the type of file and its format. It usually consists of three or four characters, separated from the filename by a period (dot). For example, in example.txt, “.txt” is the extension indicating this is a plain-text file.
In general, Linux and other Unix-based operating systems rely less heavily than Windows on file extensions, so much so that we can search for files by extension or content type regardless of extension.
In this tutorial, we’ll see how Linux-based systems and programs use mixed techniques to determine file types, not relying exclusively on extensions or contents. Considering that each distribution and graphical environment (KDE, Gnome, Xfce, Cinnamon, and others) may vary in the management of files, we’ll focus on a fresh installation of Linux Mint 21.1 Cinnamon edition to test everyday use cases.
2. Test Cases
Let’s explore various situations. In general, for each test case, we can imagine five different extensions with the following characteristics:
- in accordance with the content type
- incorrect, but belonging to the same category (audio, video, image, or other), such as a jpeg image with .png extension, or a Word document with LibreOffice Writer extension
- associated with a different type of content, e.g., a spreadsheet with a video content extension
- random and not associated with any known content
Before starting, we can go deeper into the meaning of each extension by keeping the File Extension Library as a reference. It contains information on 39879 known extensions as of February 2023.
2.1. Web Servers
$ sudo apt-get install lamp-server^
By default, the /var/www/html folder contains a sample HTML file:
$ cd /var/www/html $ ls -l total 12 -rw-r--r-- 1 root root 10671 Feb 21 15:41 index.html
This file is the “Apache2 Default Page” accessible via http://localhost/:
We can reasonably expect Apache to place much importance on file extensions. One of the reasons is its default DirectoryIndex directive:
$ cd /etc/apache2/mods-enabled/ $ cat dir.conf <IfModule mod_dir.c> DirectoryIndex index.html index.cgi index.pl index.php index.xhtml index.htm </IfModule>
This directive tells Apache the priority of the files. In this case, if we had two files with the same name but different extensions, e.g., index.html and index.php, Apache would use the one with the .html extension.
Let’s try removing the extension:
$ cd /var/www/html $ sudo mv index.html index
The result is that Apache makes a directory listing:
We might be surprised that the browser displays the page correctly if we give it the extension .php. The reason is that PHP considers all code not included between the <?php and ?> tags to be HTML code, which is so in this case.
Finally, if we use the invalid .pl extension, which is included in DirectoryIndex, the browser downloads the file instead of displaying it.
In conclusion, extensions are significant for web servers.
2.2. Package Managers
dpkg is a low-level tool for installing, building, removing, and managing software packages on Debian-based Linux distros, like the one we use for this testing. Other high-level software management tools, such as apt, aptitude, gdebi, synaptic, mint-install, and others, are built on dpkg.
Let’s start this test by downloading the Google Chrome deb package and making copies of it with different extensions:
$ wget -O test.deb 'https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb' $ cp test.deb test.rpm $ cp test.deb test.mp4 $ cp test.deb test.xyb $ cp test.deb test
This way, we have five copies of the same file, with the various possible extension test cases:
- Valid → .deb
- Invalid but belonging to the same category → .rpm
- Associated with a different type of content → .mp4
- Random and not associated with any known content → .xyb
- Without extension
The Nemo file manager presents files with the following icons, each of which indicates both the type of content and its association with a program:
- Files with extension absent, valid (.deb), and unknown (.xyb) are correctly recognized as deb packages and associated with GDebi Package Installer, which can open them and run the Google Chrome installation.
- .mp4 file is incorrectly recognized as video content and associated with Celluloid.
- .rpm file is incorrectly recognized as an rpm package and associated with Archive Manager.
$ file test test: Debian binary package (format 2.0), with control.tar.xz, data compression xz $ sudo gdebi test [...] Do you want to install the software package? [y/N]:y [...] $ file test.deb test.deb: Debian binary package (format 2.0), with control.tar.xz, data compression xz $ sudo gdebi test.deb [...] Do you want to install the software package? [y/N]:y [...] $ file test.mp4 test.mp4: Debian binary package (format 2.0), with control.tar.xz, data compression xz $ sudo gdebi test.mp4 [...] Do you want to install the software package? [y/N]:y [...] $ file test.rpm test.rpm: Debian binary package (format 2.0), with control.tar.xz, data compression xz $ sudo gdebi test.rpm [...] Do you want to install the software package? [y/N]:y [...] $ file test.xyb test.xyb: Debian binary package (format 2.0), with control.tar.xz, data compression xz $ sudo gdebi test.xyb [...] Do you want to install the software package? [y/N]:y [...]
We’ve previously seen that invalid extensions can fool the file manager. In contrast, this result clearly shows the extension is meaningless for these command-line tools.
Let’s create a test Bash (Bourne Again SHell) script with a portable shebang:
$ echo '#!/usr/bin/env bash' > test.sh $ echo 'echo "Baeldung is Awesome!"' >> test.sh $ cat ./test.sh #!/usr/bin/env bash echo "Baeldung is Awesome!" $ chmod +x ./test.sh $ ./test.sh Baeldung is Awesome! $ file ./test.sh ./test.sh: Bourne-Again shell script, ASCII text executable
Our test.sh file is correctly recognized and executed as a Bash script.
Trying the usual test cases, let’s discover if the extension is meaningful. In this case, an invalid extension associated with a scripting language other than Bash could be .py, usually used for Python:
$ cp test.sh test.py $ file ./test.py ./test.py: Bourne-Again shell script, ASCII text executable $ ./test.py Baeldung is Awesome! $ cp test.sh test.mp4 $ file ./test.mp4 ./test.mp4: Bourne-Again shell script, ASCII text executable $ ./test.mp4 Baeldung is Awesome! $ cp test.sh test.xyb $ file ./test.xyb ./test.xyb: Bourne-Again shell script, ASCII text executable $ ./test.xyb Baeldung is Awesome! $ cp test.sh test $ file ./test ./test: Bourne-Again shell script, ASCII text executable $ ./test Baeldung is Awesome!
All this shows that the extension is meaningless for scripts, provided they start with an appropriate shebang. Without it, the issue is more complex, but we won’t elaborate on it.
Let’s create a test file with LibreOffice Writer and copy it, as in the previous cases, with different extensions:
$ echo "Sample file" > test.txt $ libreoffice --headless --convert-to odt test.txt convert /media/[...]/temp/test.txt -> /media/[...]/temp/test.odt using filter : writer8 $ rm test.txt $ file test.odt test.odt: OpenDocument Text $ cp test.odt test.docx $ cp test.odt test.png $ cp test.odt test.xyb $ cp test.odt test
This time the file manager is deceived only by the .png extension:
2.5. Image Files
Let’s download a JPEG file and copy it with different extensions:
$ wget -O test.jpg 'https://upload.wikimedia.org/wikipedia/commons/e/ed/Sacred_lotus_Nelumbo_nucifera.jpg' $ file test.jpg test.jpg: JPEG image data, Exif standard: [...], 2312x2119, [...] $ cp test.jpg test.png $ cp test.jpg test.docx $ cp test.jpg test.xyb $ cp test.jpg test
This time the file manager is cleverer, correctly generating the thumbnails for all images regardless of the wrong extensions:
2.6. Audio Files
Let’s download an OGG file and copy it with different extensions:
$ wget -O test.ogg 'https://upload.wikimedia.org/wikipedia/commons/e/e1/It-Roma.ogg' $ file test.ogg test.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, [...] $ cp test.ogg test.mp3 $ cp test.ogg test.odt $ cp test.ogg test.xyb $ cp test.ogg test
In this case, only the .odt file fools the file manager:
At this point, a recurring behavior seems evident. That is, the file manager trusts known extensions, relying on the file content only if the extension is unknown or absent. In contrast, individual programs rely only on the file content, ignoring its extension.
2.7. Video Files
Let’s proceed with a video, noting that the .ogg extension, already used in the previous test case, is valid for both audio and video content:
$ wget -O test.ogg 'https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg' $ file test.ogg test.ogg: Ogg data, Theora video $ cp test.ogg test.mp4 $ cp test.ogg test.odt $ cp test.ogg test.xyb $ cp test.ogg test
The result in the file manager is consistent with the previous ones:
2.8. Email Attachments
In general, spam filters don’t welcome invalid extensions. In addition, popular web services, such as Gmail, block emails containing attachments with banned extensions.
In this test, we’ll check how Thunderbird, one of the most popular mail clients for Linux, handles receiving an email with the same PDF attached with various extensions.
Let’s create a PDF file and copy it with different extensions:
$ echo "Sample PDF file" > test.txt $ libreoffice --headless --convert-to pdf test.txt $ rm test.txt $ file test.pdf test.pdf: PDF document, version 1.6, 1 pages (zip deflate encoded) $ cp test.pdf test.docx $ cp test.pdf test.png $ cp test.pdf test.xyb $ cp test.pdf test
Let’s attach the five files thus created to an email, which we then receive with Thunderbird. The result is that this email client relies exclusively on extensions. In fact, in the center of the email, there is an icon indicating that Thunderbird cannot display the attached image test.png:
2.9. Other Considerations
There are case scenarios that elude the types of tests done so far. E.g., it makes no sense to ask whether the extension is meaningful in the case of global Linux configuration files, hidden configuration files in the user’s home, logs, device files, temporary files, installed programs, and other types of special files. In all these cases, each file role depends on its full path, including its name and extension, if any. We cannot change the file name or extension in these cases.
Moreover, when we create symbolic and hard links to our files (images, documents, audio, etc.), we can choose the extension we prefer. However, the file manager and programs will handle such links as in the test cases above.
Compressed files deserve a final note because, by convention, they may have a double extension, such as .tar.gz. Replacing the double extension with something else makes it more difficult to manage these files.
In this article, we’ve seen how Linux can manage files according to their content, regardless of their extension. However, this isn’t always possible, or at least it’s challenging.
In general, meaningful filenames with correct cross-platform extensions are the easiest and tidier way to manage and share our data. We shouldn’t abuse Linux’s ability to handle files with the wrong extensions. That ability is an aid, not an invitation.