1. Introduction

The kernel is the core of an operating system (OS). It defines system calls, hardware support, and other features as the foundation on top of which userspace, i.e., non-kernel, applications run. Of course, the Linux kernel is also what identifies a given environment as Linux. Naturally, the kernel itself is a program, mostly written in C, although there is a push to port parts of it to Rust. Consequently, it has a source code and header files same as other C programs.

In this tutorial, we explore the Linux kernel and the source code headers to understand why they are a special separate entity. First, we discuss the Linux kernel in general. After that, we understand the tight link between Linux and C. Finally, we go over the Linux kernel source code and, in particular, its header files.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. Unless otherwise specified, it should work in most POSIX-compliant environments.

2. The Linux Kernel

Linux is a free and open-source multitasking operating system that started life with the Linux kernel in 1991 by Linux Torvalds. Although adhering to the UNIX standard in many regards, Linux is its own OS.

Since for the most part it gets distributed under the fairly liberal GNU General Public License, the Linux kernel can and is used by many software projects. Collectively, all of the latter are known as Linux, although distributions vary from minimalistic embedded versions, through full-fledged server implementations and even GPU-centric desktop and gaming implementations.

To achieve such flexibility, the kernel has several important characteristics.

2.1. Structure (Subsystems)

Although it has multiple components from security modules and encryption of volumes to device mappers, through packet filtering and audio frameworks, the Linux kernel mainly comprises several subsystems:

  • Memory management
  • Process scheduling
  • Interprocess Communication (IPC)
  • Virtual files
  • Networking

They cover the bare essentials that the OS provides to its applications in terms of hardware and process management.

2.2. Primarily Monolithic

Instead of taking the micro approach, the Linux kernel is mainly monolithic. This means its core components are internally connected at compile time instead of runtime:

|          Hardware           |
|                             |
|  +-----------------------+  |
|  |       Userspace       |  |
|  +-----------------------+  |
|    v                   ^    |
|  +-----------------------+  |
|  |     Linux Kernel      |  |
|  |                       |  |
|  |   +---------------+   |  |
|  |   |     Core      |   |  |
|  |   +---------------+   |  |
|  |     v           ^     |  |
|  |  +-----------------+  |  |
|  |  |     Modules     |  |  |
|  |  +-----------------+  |  |
|  +-----------------------+  |

This leads to increased performance but exposes the system to potential issues concerning security and maintenance. Further, the kernel is not pageable, so it runs as a whole entity within working memory.

On the other hand, only a small part of recent Linux kernel versions is the actual system core, while the rest comprises hardware driver support and modules.

2.3. Partially Modular

Regardless of its monolithic heritage, Linux supports and works with kernel modules, some of which can be loaded on demand without recompiling.

Many contain features and functionality that the kernel developers don’t see as vital for the operation of the OS but still consider relevant enough to ship with the kernel package. Let’s see some examples of such:

  • nf_* netfilter modules for package filtering
  • kvm kernel virtual machine support
  • nfsd Network FileSystem (NFS) daemon
  • ext4 filesystem support
  • fuse userspace filesystem mounting

While the list is very long, only some modules are deemed important enough to be statically built into the kernel itself. Further, that list varies by distribution and use case. Even then, we can pick which modules we include when building.

On many systems, we can see a list of statically built modules via the modules.builtin file for the current kernel version as returned by uname:

$ cat /lib/modules/$(uname --kernel-release)/modules.builtin

Perhaps more importantly, we can develop custom loadable kernel modules.

2.4. Configurable

Notably, the flexibility of Linux is made even more evident by the fact that it exposes many of its internal structures and configuration via several pseudo-filesystems:

  • /proc: process-related information and kernel configuration
  • /sys: general system information and configuration
  • /dev: device and pseudo-device files for reading and writing data

Critically, kernels before version 2.6.26 even provided a readable and writable /dev/kmem file with all internal kernel memory structures. Now, most of this relic has been organized under the /proc and /sys paths for more structured access and management. Still, controlling a fundamental function of Linux is often as simple as writing to a file.

This enables live patching of the kernel itself without rebooting, making Linux invaluable for the long system uptimes necessary for contemporary systems.

3. Linux Kernel Base

As already mentioned, the Linux kernel is mainly written in C.

Because of this, it has tight integration into the C language in the form of the standard C library that sits directly atop the kernel:

|                Hardware                |
|                                        |
|  +----------------------------------+  |
|  |            Userspace             |  |
|  |                                  |  |
|  | +------------------------------+ |  |
|  | |      System Components       | |  |
|  | +------------------------------+ |  |
|  | +------------------------------+ |  |
|  | |      C Standard Library      | |  |
|  | +------------------------------+ |  |
|  +----------------------------------+  |
|    v                              ^    |
|  +----------------------------------+  |
|  |           Linux Kernel           |  |
|  |                                  |  |
|  |  +--------+     +-------------+  |  |
|  |  |  Core  |<--->|   Modules   |  |  |
|  |  +--------+     +-------------+  |  |
|  +----------------------------------+  |

Effectively, the base of the user mode is the standard C library and its functions such as malloc() and memcopy(). Further, they are tightly linked with Linux kernel system calls like stat(), read, open(), and others.

4. Linux Kernel Source Code

At this point, we can dive right into the Linux kernel source code.

Since it’s usually most convenient to do so via the GitHub repository, we clone that for our analysis with the git command:

$ git clone https://github.com/torvalds/linux
Cloning into 'linux'...
remote: Enumerating objects: 9987666, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (188/188), done.
remote: Total 9987666 (delta 166), reused 200 (delta 66), pack-reused 9987666
Receiving objects: 100% (9987666/9987666), 4.67 GiB | 16.66 MiB/s, done.
Resolving deltas: 100% (8158666/8158666), done.
Updating files: 100% (83666/83666), done.

Importantly, this process can take a long time, as the whole code base is around 5GB. Alternatively, we can get the source code package for our current system or a snapshot of another specific version, which should be much smaller.

4.1. Basic Structure

First, let’s see the top-level organization of the Linux kernel source code via ls:

$ ls linux/
arch     CREDITS        fs        ipc      lib          mm      samples   tools
block    crypto         include   Kbuild   LICENSES     net     scripts   usr
certs    Documentation  init      Kconfig  MAINTAINERS  README  security  virt
COPYING  drivers        io_uring  kernel   Makefile     rust    sound

Alphabetically first, the arch directory is critical, since it contains code that varies between platforms and architectures. This is in addition to the common kernel sources. This trend continues with other directories as well:

  • lib: kernel libraries
  • mm: memory management code
  • ipc: interprocess communications
  • net: networking code
  • crypto: cryptography implementations
  • include: header files

Naturally, many directories such as sounds, security, drivers, tools, and modules have fairly self-explanatory names.

Of course, Documentation can be very helpful when debugging and understanding the source code structure. Further, the init directory contains the initialization code, making it a great place to start when tracing the inner workings of the kernel. When it comes to filesystem storage, we go to fs.

Due to the complex nature of the whole kernel source code package, we might not always want or need to have all of this data available.

4.2. Header Files (include)

Sometimes, applications could require a subset of the source files for interfacing with the kernel. This is where headers come in.

Header files in C and in the Linux kernel source code include the necessary components for understanding the definitions and interface of the system. In other words, we might not see how a given function is implemented, but we can know it’s there and what data it expects and returns, i.e., its signature. This is often enough for applications to connect with the current kernel since its implementation is already in place, albeit in binary form. Metaphorically, we can see the headers as a blueprint description of the kernel.

For instance, we might want to write a driver or other module. To compile it on a given system, we need at least two headers:

  • include/linux/module.h: module loading and unloading logic and interface
  • include/linux/init.h: module initialization and destruction procedures

However, module.h also fetches include/linux/version.h, so it’s rarely a good idea to get only a subset of the headers.

The main include directory and the include subdirectory under each [arch]itecture collectively hold all Linux header files.

Let’s see how we can extract them via make for i386:

$ cd linux
$ make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr

Notably, when working with architecture-specific sources, the include/asm path is a symbolic link to the actual directory for that architecture.

4.3. Kernel Packages

Due to the separation of the kernel interface headers from the whole codebase, we end up with two main kernel packages:

  • linux-source-<version>: complete sources
  • linux-headers-<version>: header files

In both cases, version can also be generic for the relevant metapackage.

The complete sources are useful for kernel modification, customization, rebuilding, and low-level development. On the other hand, header files are often required by system applications such as guest additions in hypervisors and security toolsets like OpenSSL.

Historically, separating the two also saved a considerable amount of resources on the system.

5. Summary

In this article, we discussed the Linux kernel, source code, and header files.

In conclusion, although complex, the Linux kernel presents a fairly consistent interface, which application developers can access through header files.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.