Files and File Systems | Baeldung on Computer Science

1. Overview

We use so many different file types such as text files, image files, and music files in our daily life.

In this tutorial, we’re going to look at how files work, and how computers deal with their organization by using file systems.

2. What Is a File and File Format?

A file is a series of bits, bytes, or records. Its meaning is defined by the author and user of the file. Every file has a logical place in which it is stored or retrieved. Data inside the file is somehow organized and we call it a file format. We can create our own format as well, however, it is easiest and best to use an existing standard such as JPEG, PNG, and TXT. Basically, a file contains metadata and payload. Let’s look at the bitmap (BMP) format, and how its metadata works in the next section.

2.1. Bitmap Format

We should know several details such as bit rate and whether it is a single track or not before we can read the data accurately. So, it is actually data about data and we call it metadata. Let’s take a look at the metadata of the bitmap format below:

As we can see, bitmap’s metadata includes some important values like total file size, image width, image height, and color depth. Like in bitmap format, all other file formats such as TXT, PNG, PPT, ZIP have metadata. They’re all the same. They have long lists of numbers and these numbers are in binary format on a storage device. File formats have an essential role in reading and understanding the data inside.

3. How to Store Files?

We’ve looked at how files work, and now we’ll continue with how computers can store these files. Regardless of whether the underlying storage unit is a strip of tape, a drum, or a disk, hardware and software abstractions allow us to think of storage as long-line containers that store values.

When computers simply performed one computation, the whole storage system functioned as a single large file. In the early storage systems, data directly began at the start of storage and loaded up in the order that output was created. This process was repeated until the storage unit was full. However, as storage and computation technology have improved, it became practical and beneficial to store more than one file at a time.

In that case, storing files back to back is the first idea and this idea can work. It’s like storing personal IDs in an integer array consecutively and no stored information about the size of the personal IDs. In that case, the data stored is meaningless. That’s why the computer must know where files begin and end. Storage units don’t have a specific part to that operation. They just store lots of bits.

3.1. Directory File, Fragmentation, and Defragmentation

In order to deal with this issue, we need to have a special file that records where other files begin and end. While we call it many other names, the most knows is that directory file.

The directory file keeps the names of all the other files in the storage, where the files begin and end. It also stores metadata about these files and how long they are. When we want to add a file or remove a file, we have to update the information in the directory file. This directory system is a small part of the file system that is part of an operating system and manages all stored files.

This one-level system can be problematic when we try to add some data to some files. Because there will be no space to perform it without overwriting the next file in the storage unit. Therefore, modern file systems in modern operating systems have two methodologies. The first one is that they store files in blocks. This method saves a little more space for modifications, and it’s called slack space. It also enables that all file data share a common size. This simplifies management. The second thing that the file system performs, is to allow files to be divided into pieces and stored over several blocks.

We call it fragmentation. While this can be a headache for many storage technologies like magnetic tape, to open large files in a short time, defragmentation comes into stage and solves this issue. What the computer performs is that it actually copies data and tries to store them in the right order. After the defragmentation, we can open our file.

3.2. File Hierarchy System

Until this point, we assume that all files are in the same directory, but of course, it is not practical to keep all the files at the same level. It’s like documents in the real world actually, it can be really useful to store related files together in folders. Then we can put together related folders into other folders. This is called file hierarchy system and what our computers use. We can see an example of it in the figure below:

4. File Systems

A filesystem is the set of methods and data structures the operating system employs to keep track of files on a disk or partition. With the use of a filesystem, data placed in a storage unit can be interpreted by the operating system. Other than that, it’s just one large part of data that we don’t know where it begins and ends.

There are so many different kinds of file systems. Let’s look at some types of file systems.

4.1. Disk File Systems

A disk file system has the capacity to randomly access data on disk storage media in a short amount of time. There are so many examples of disk file systems such as FAT, NTFS, HFS, UFS, and ZFS.

Optical discs are also a member of disk file systems. ISO 9660 and UDF are common formats for CDs, DVDs, and Blu-ray discs.

4.2. Flash File Systems

A flash file system takes into account flash memory devices’ unique capabilities, performance, and limitations. Even though flash memory devices can use disk file systems as the underlying storage media, it is better to utilize a file system particularly implemented for flash devices.

4.3. Database File Systems

Another type of file system is the database-based file system. Rather than hierarchically organized administration, files are identified by their attributes such as file type, author, and topic.

4.4. Network File Systems

Access to files on a server is provided through a network file system. Programs may create, manage, and access files and directories on the remote network-connected machines using local interfaces. Network file systems include file-system-like clients for FTP and WebDAV.

4.5. Tape File Systems

A tape file system is a tape format and file system that is designed to store files on tape. They are mostly in magnetic tapes format. Magnetic tapes are consecutive storage units that take much longer to access random data than disks. This makes the creation and maintenance of a general-purpose file system difficult.

Although these kinds of issues, IBM could develop a file system for tape and call it the Linear Tape File System. They have released the system as an open-source IBM Linear Tape File System – Single Drive Edition (LTFS-SDE).

5. Conclusion

File systems enable us to hide the raw bits stored on disks. They actually allow us to think of data as ordered and easily accessible files.

In this article, we’ve described what the file and file format is, and we’ve also given details about how computers store files in storage units. We’ve also mentioned file systems and given details about different types of file systems.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung