1. Overview

A file extension is a suffix at the end of a filename indicating the type of file and its format. It usually consists of three or four characters, separated from the filename by a period (dot). For example, in example.txt, “.txt” is the extension indicating this is a plain-text file.

In general, Linux and other Unix-based operating systems rely less heavily than Windows on file extensions, so much so that we can search for files by extension or content type regardless of extension.

In this tutorial, we’ll see how Linux-based systems and programs use mixed techniques to determine file types, not relying exclusively on extensions or contents. Considering that each distribution and graphical environment (KDE, Gnome, Xfce, Cinnamon, and others) may vary in the management of files, we’ll focus on a fresh installation of Linux Mint 21.1 Cinnamon edition to test everyday use cases.

2. Test Cases

Let’s explore various situations. In general, for each test case, we can imagine five different extensions with the following characteristics:

  • in accordance with the content type
  • incorrect, but belonging to the same category (audio, video, image, or other), such as a jpeg image with .png extension, or a Word document with LibreOffice Writer extension
  • associated with a different type of content, e.g., a spreadsheet with a video content extension
  • random and not associated with any known content
  • absent

Before starting, we can go deeper into the meaning of each extension by keeping the File Extension Library as a reference. It contains information on 39879 known extensions as of February 2023.

2.1. Web Servers

Let’s assume we have a basic LAMP (Linux + Apache + MySql + PHP) installation, obtained via the following tasksel task:

$ sudo apt-get install lamp-server^

By default, the /var/www/html folder contains a sample HTML file:

$ cd /var/www/html
$ ls -l
total 12
-rw-r--r-- 1 root root 10671 Feb 21 15:41 index.html

This file is the “Apache2 Default Page” accessible via http://localhost/:

LAMP default sample html pageWe can reasonably expect Apache to place much importance on file extensions. One of the reasons is its default DirectoryIndex directive:

$ cd /etc/apache2/mods-enabled/
$ cat dir.conf
<IfModule mod_dir.c>
	DirectoryIndex index.html index.cgi index.pl index.php index.xhtml index.htm
</IfModule>

This directive tells Apache the priority of the files. In this case, if we had two files with the same name but different extensions, e.g., index.html and index.php, Apache would use the one with the .html extension.

Let’s try removing the extension:

$ cd /var/www/html
$ sudo mv index.html index

The result is that Apache makes a directory listing:

Apache2 directory listingWe get the same result, the directory listing, by putting any invalid extension other than .html and all others included in the DirectoryIndex directive.

We might be surprised that the browser displays the page correctly if we give it the extension .php. The reason is that PHP considers all code not included between the <?php and ?> tags to be HTML code, which is so in this case.

Finally, if we use the invalid .pl extension, which is included in DirectoryIndex, the browser downloads the file instead of displaying it.

In conclusion, extensions are significant for web servers.

2.2. Package Managers

dpkg is a low-level tool for installing, building, removing, and managing software packages on Debian-based Linux distros, like the one we use for this testing. Other high-level software management tools, such as apt, aptitude, gdebi, synaptic, mint-install, and others, are built on dpkg.

Let’s start this test by downloading the Google Chrome deb package and making copies of it with different extensions:

$ wget -O test.deb 'https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb'
$ cp test.deb test.rpm
$ cp test.deb test.mp4
$ cp test.deb test.xyb
$ cp test.deb test

This way, we have five copies of the same file, with the various possible extension test cases:

  • Valid → .deb
  • Invalid but belonging to the same category → .rpm
  • Associated with a different type of content → .mp4
  • Random and not associated with any known content → .xyb
  • Without extension

The Nemo file manager presents files with the following icons, each of which indicates both the type of content and its association with a program:

Nemo - testing deb packages with random extensionsWe can deduce the following from such icons:

  • Files with extension absent, valid (.deb), and unknown (.xyb) are correctly recognized as deb packages and associated with GDebi Package Installer, which can open them and run the Google Chrome installation.
  • .mp4 file is incorrectly recognized as video content and associated with Celluloid.
  • .rpm file is incorrectly recognized as an rpm package and associated with Archive Manager.

Instead, let’s check whether the command-line utilities gdebi and file are indifferent to the extension:

$ file test
test: Debian binary package (format 2.0), with control.tar.xz, data compression xz
$ sudo gdebi test
[...]
Do you want to install the software package? [y/N]:y
[...]

$ file test.deb
test.deb: Debian binary package (format 2.0), with control.tar.xz, data compression xz
$ sudo gdebi test.deb
[...]
Do you want to install the software package? [y/N]:y
[...]

$ file test.mp4
test.mp4: Debian binary package (format 2.0), with control.tar.xz, data compression xz
$ sudo gdebi test.mp4
[...]
Do you want to install the software package? [y/N]:y
[...]

$ file test.rpm
test.rpm: Debian binary package (format 2.0), with control.tar.xz, data compression xz
$ sudo gdebi test.rpm
[...]
Do you want to install the software package? [y/N]:y
[...]

$ file test.xyb
test.xyb: Debian binary package (format 2.0), with control.tar.xz, data compression xz
$ sudo gdebi test.xyb
[...]
Do you want to install the software package? [y/N]:y
[...]

We’ve previously seen that invalid extensions can fool the file manager. In contrast, this result clearly shows the extension is meaningless for these command-line tools.

2.3. Scripts

Let’s create a test Bash (Bourne Again SHell) script with a portable shebang:

$ echo '#!/usr/bin/env bash' > test.sh
$ echo 'echo "Baeldung is Awesome!"' >> test.sh
$ cat ./test.sh
#!/usr/bin/env bash
echo "Baeldung is Awesome!"
$ chmod +x ./test.sh 
$ ./test.sh 
Baeldung is Awesome!
$ file ./test.sh 
./test.sh: Bourne-Again shell script, ASCII text executable

Our test.sh file is correctly recognized and executed as a Bash script.

Trying the usual test cases, let’s discover if the extension is meaningful. In this case, an invalid extension associated with a scripting language other than Bash could be .py, usually used for Python:

$ cp test.sh test.py
$ file ./test.py
./test.py: Bourne-Again shell script, ASCII text executable
$ ./test.py 
Baeldung is Awesome!

$ cp test.sh test.mp4
$ file ./test.mp4
./test.mp4: Bourne-Again shell script, ASCII text executable
$ ./test.mp4
Baeldung is Awesome!

$ cp test.sh test.xyb
$ file ./test.xyb
./test.xyb: Bourne-Again shell script, ASCII text executable
$ ./test.xyb 
Baeldung is Awesome!

$ cp test.sh test
$ file ./test
./test: Bourne-Again shell script, ASCII text executable
$ ./test 
Baeldung is Awesome!

All this shows that the extension is meaningless for scripts, provided they start with an appropriate shebang. Without it, the issue is more complex, but we won’t elaborate on it.

2.4. Documents

Let’s create a test file with LibreOffice Writer and copy it, as in the previous cases, with different extensions:

$ echo "Sample file" > test.txt
$ libreoffice --headless --convert-to odt test.txt
convert /media/[...]/temp/test.txt -> /media/[...]/temp/test.odt using filter : writer8
$ rm test.txt

$ file test.odt
test.odt: OpenDocument Text

$ cp test.odt test.docx
$ cp test.odt test.png
$ cp test.odt test.xyb
$ cp test.odt test

This time the file manager is deceived only by the .png extension:

Nemo File Manager - LibreOffice files with wrong extensionsTo our surprise, LibreOffice Writer completely ignores the extension, as it opens all five files correctly, including the .png one:

LibreOffice opens correctly files with wrong extensionsHowever, to open the .png with LibreOffice Writer, we used the “Open with” menu available by right-clicking on the file.

2.5. Image Files

Let’s download a JPEG file and copy it with different extensions:

$ wget -O test.jpg 'https://upload.wikimedia.org/wikipedia/commons/e/ed/Sacred_lotus_Nelumbo_nucifera.jpg'
$ file test.jpg
test.jpg: JPEG image data, Exif standard: [...], 2312x2119, [...]
$ cp test.jpg test.png
$ cp test.jpg test.docx
$ cp test.jpg test.xyb
$ cp test.jpg test

This time the file manager is cleverer, correctly generating the thumbnails for all images regardless of the wrong extensions:

Nemo thumbnails of images with wrong extensionsEven XViewer, the default viewer, correctly recognizes that all files are images:

Image Viewer opens an image with docx extensionThe circled inscription at the bottom indicates that this is the second image of five available, confirming that XViewer recognizes all photos regardless of extension.

2.6. Audio Files

Let’s download an OGG file and copy it with different extensions:

$ wget -O test.ogg 'https://upload.wikimedia.org/wikipedia/commons/e/e1/It-Roma.ogg'
$ file test.ogg
test.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, [...]
$ cp test.ogg test.mp3
$ cp test.ogg test.odt
$ cp test.ogg test.xyb
$ cp test.ogg test

In this case, only the .odt file fools the file manager:

Nemo with audio filesOn the other hand, Celluloid, the default player, plays all five files correctly:

Celluloid plays audio file with wrong extensionAt this point, a recurring behavior seems evident. That is, the file manager trusts known extensions, relying on the file content only if the extension is unknown or absent. In contrast, individual programs rely only on the file content, ignoring its extension.

2.7. Video Files

Let’s proceed with a video, noting that the .ogg extension, already used in the previous test case, is valid for both audio and video content:

$ wget -O test.ogg 'https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg'
$ file test.ogg
test.ogg: Ogg data, Theora video
$ cp test.ogg test.mp4
$ cp test.ogg test.odt
$ cp test.ogg test.xyb
$ cp test.ogg test

The result in the file manager is consistent with the previous ones:

Nemo with video filesCelluloid confirms to be extension-independent:

Celluloid plays video file with wrong extensionThese results confirm the management of extensions seen so far.

2.8. Email Attachments

In general, spam filters don’t welcome invalid extensions. In addition, popular web services, such as Gmail, block emails containing attachments with banned extensions.

In this test, we’ll check how Thunderbird, one of the most popular mail clients for Linux, handles receiving an email with the same PDF attached with various extensions.

Let’s create a PDF file and copy it with different extensions:

$ echo "Sample PDF file" > test.txt
$ libreoffice --headless --convert-to pdf test.txt
$ rm test.txt
$ file test.pdf
test.pdf: PDF document, version 1.6, 1 pages (zip deflate encoded)

$ cp test.pdf test.docx
$ cp test.pdf test.png
$ cp test.pdf test.xyb
$ cp test.pdf test

Let’s attach the five files thus created to an email, which we then receive with Thunderbird. The result is that this email client relies exclusively on extensions. In fact, in the center of the email, there is an icon indicating that Thunderbird cannot display the attached image test.png:

Thunderbird opens an email with attachmentsBy double-clicking on attachments, Thunderbird displays only test.pdf. In all other cases, it suggests downloading the file or opening it with another program.

2.9. Other Considerations

There are case scenarios that elude the types of tests done so far. E.g., it makes no sense to ask whether the extension is meaningful in the case of global Linux configuration files, hidden configuration files in the user’s home, logs, device files, temporary files, installed programs, and other types of special files. In all these cases, each file role depends on its full path, including its name and extension, if any. We cannot change the file name or extension in these cases.

Moreover, when we create symbolic and hard links to our files (images, documents, audio, etc.), we can choose the extension we prefer. However, the file manager and programs will handle such links as in the test cases above.

Compressed files deserve a final note because, by convention, they may have a double extension, such as .tar.gz. Replacing the double extension with something else makes it more difficult to manage these files.

3. Conclusion

In this article, we’ve seen how Linux can manage files according to their content, regardless of their extension. However, this isn’t always possible, or at least it’s challenging.

In general, meaningful filenames with correct cross-platform extensions are the easiest and tidier way to manage and share our data. We shouldn’t abuse Linux’s ability to handle files with the wrong extensions. That ability is an aid, not an invitation.

Comments are closed on this article!