For most users, files are the gateways to data on a system. There are many different types of files. Thus, knowing how to recognize, handle, and modify them properly can be vital.
In this tutorial, we explore whether, why, and when regular files should end with a newline. First, we delve into the types of regular files in terms of content and structure. Next, we discuss the theoretical reasoning behind text files terminating on a newline. After this, we understand that, even in theory, the standard is not necessarily applicable to all files. Finally, we use real-life cases to discuss when it matters and why.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Regular File Types
Regular files are simply piles of data, hidden behind an inode inside the filesystem.
Regardless of the file’s contents, reading a file completes with an EOF (End-Of-File) character, indicating no more data is available to read. Importantly, that character does not show up as part of the contents.
2.1. Text and Binary Files
Of course, looking at the file contents alone, the two fundamental formats are text and binary. Even these are not always easy to distinguish:
$ cat /file $ file --mime /file /file: inode/x-empty; charset=binary
The file utility has the –mime switch, which attempts to detect the MIME format. As a bonus, it also shows the detected charset – whether it’s binary or not.
Notice that an empty file is considered binary. However, if we insert some printable ASCII data via a here document, that changes:
$ cat > /file <<EOI !Xfile CgTPK3EAYj EOI $ file --mime /file /file: text/plain; charset=us-ascii
Usually, a single non-printable byte added via echo with its -e switch is enough for a binary verdict:
$ echo -e '\x05' >> /file $ file --mime /file /file: application/octet-stream; charset=binary
Importantly, there isn’t a separate definition for binary files in the POSIX standard. However, POSIX tools often specify “text files” as their input.
2.2. File Type Identification
While there are certain starting bytes, also called “magic numbers”, which can point to one format over another, expecting such identifiers is rarely reliable on its own. Similarly, complex file signatures can show false positives.
On the other hand, in Microsoft Windows environments, file types have become synonymous with file extensions. Part of the idea is applicable in Linux as well. However, even that concept is misleading, not in the least due to overlapping extension names.
As there are many ways to detect file types, free tools like the TrID File Identifier combine multiple methods into a heuristic:
$ trid /file.ogg -v -r:2 TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello Collecting data from file: /file.ogg Definitions found: 5702 Analyzing... 77.8% (.OGG) OGG Vorbis audio (14014/3) [...] 22.2% (.OGG) OGG stream (generic) (4000/1) [...]
Here, /file.ogg is detected as an OGG Vorbis audio file.
Why do we care about file formats and types? Depending on whether the file is binary or text and its type’s consumers, there’s a trivial but important requirement that Linux may warn about or outright enforce.
3. Newlines at the End of the File
POSIX defines a line as a possibly empty sequence of non-newline characters, terminating in a newline, also called EOL (End-Of-Line), ASCII code 0x0A. Meanwhile, a text file is defined as consisting of lines.
Thus, the last line of a file, by definition and standard, should conclude that file with EOL. Of course, that’s definitely not always the case:
$ xxd -s -1 /etc/shadow 000004d2: 0a $ xxd -s -1 /etc/X11/dummy-xorg.conf 00000229: 6e
Here, we use xxd to see the last character (-s -1) code for a couple of common Linux text configuration files. One ends with an EOL, but the other one doesn’t.
As humans either write the programs that generate file data or input data in files directly, such inconsistencies are not uncommon. Of course, an EOL at the end is basically only expected of text files,
4. EOL in Binary
Since binary files do not consist of lines in the traditional sense, the definition of a line shouldn’t apply to them:
$ xxd -s -1 /etc/ld.so.cache 00006660: 00
Here, as is common, the /etc/ld.so.cache binary file ends with a NUL character.
Due to the way binary files are organized, any EOL, be it at the end or not, can even be a coincidence. That’s because 0x0A is simply one of 255 possible unique bytes.
5. When It Matters and Why
Earlier, we looked at file types because they determine the file consumers we should consider. In fact, it’s up to the consumer to check, report, or even reject a certain format.
We’ll use the following file contents with (file /file) and without (file /badfile) a newline at the end:
Let’s look at some typical cases where that difference matters.
5.1. cat Terminal Output and Concatenation
Using cat, we can quickly see some potential problems for files that don’t terminate on an EOL:
$ cat /file line1 line2 $ cat /badfile line1 line2$
Note how the last prompt appears right next to the contents of /badfile. While this is a purely cosmetic issue, concatenation could make it a technical problem.
Using the same two files, concatenating one way produces a possibly problematic output:
$ cat /badfile /file line1 line2line1 line2
Of course, the result can be expected, but leaving that as the default behavior can mess up data processing, especially for structured text files, such as CSV.
5.2. wc Line Counting
One of the easier ways to see value in having all files ending with EOL is the wc command.
For example, let’s count the lines of our two files:
$ cat /file | wc --lines 2 $ cat /badfile | wc --lines 1
As expected, /file looks to be two lines long. However, /badfile seems to consist of only one line, although we can visually see two. Indeed, that’s due to the missing EOL, which wc relies on to count lines.
Next, we see the power of a complete line via the read command:
$ cat /file | while read line; do echo $line; done line1 line2 $ cat /badfile | while read line; do echo $line; done line1
The issue is obvious: When a line is incomplete, read terminates on the EOF and does not output anything. Thus, we’re left with one less line than expected.
While there are ways to remedy this, having to do so is, at the very least, an inconvenience.
5.4. Reliance on EOL EOF
Many editors automatically append EOL if it’s missing. For example, saving text files with vi would do that.
However, since we can create files in many ways, even outside a POSIX environment, we should not really depend on a terminating EOL. Actually, that can be a vulnerability in lower-level code.
Finally, keeping the above in mind, we can use newlines as a sort of primitive checksum. While transmitting file data, we know we haven’t managed to transmit all of the contents if they don’t end in EOL.
In this article, we discussed whether and why regular files should always end in a newline.
In conclusion, only text files are realistically expected to end in a newline, and even when they don’t, it should not crash a system, but some POSIX tools do rely on this standard in order to work properly.