As system administrators, it’s crucial to understand the internal details of the operating system. Undoubtedly, this knowledge helps in troubleshooting services and processes on our system. Additionally, security researchers can also use this knowledge to identify suspicious files. Understanding the structure of an ELF file helps to understand the internal details of the operating system.
In this tutorial, we’ll learn about an ELF file and its structure. We’ll also use readelf to check the structure of an ELF.
ELF is short for Executable and Linkable Format. It’s a format used for storing binaries, libraries, and core dumps on disks in Linux and Unix-based systems.
Moreover, the ELF format is versatile. Its design allows it to be executed on various processor types. This is a significant reason why the format is common compared to other executable file formats.
Generally, we write most programs in high-level languages such as C or C++. These programs cannot be directly executed on the CPU because the CPU doesn’t understand these instructions. Instead, we use a compiler that compiles the high-level language into object code. Using a linker, we also link the object code with shared libraries to get a binary file.
As a result, the binary file has instructions that the CPU can understand and execute. The binary file can adopt any format that defines the structure it should follow. However, the most common of these structures is the ELF format.
3. The Structure of the ELF File
The ELF file is divided into two parts. The first part is the ELF header, while the second is the file data.
Further, the file data is made up of the Program header table, Section header table, and Data.
Particularly, the ELF header is always available in the ELF file, while the Section header table is important during link time to create an executable. On the other hand, the Program header table is useful during runtime to help load the executable into memory.
Next, let’s look at the ELF file structure:
For instance, we see the different parts of the ELF file. We set the parts that begin with a dot for the system, while the rest are for applications.
In addition, let’s look at the different parts of the file in more detail.
3.1. ELF Header
Firstly, the ELF header is found at the start of the file. It contains metadata about the file.
For example, some of the metadata found in the ELF header includes information about whether the ELF file is 32-bit or 64-bit, whether it’s using little-endian or big-endian, the ELF version, and the architecture that the file requires.
In particular, the metadata in the ELF header helps different processor architectures to interpret the ELF file.
We use the readelf command with the -h option to show the ELF header of an ELF file. In our case, we are reading the ls binary file:
$ readelf -h /bin/ls ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x6180 Start of program headers: 64 (bytes into file) Start of section headers: 145256 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 11 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29
3.2. ELF Header Details
At this time, let’s have a closer look at what the fields in the ELF header structure represent:
|Magic||These are the first bytes in the ELF header. They identify the file as an ELF and contain information that processors can use to interpret the file.|
|Class||The value in the class field indicates the architecture of the file. As such the ELF can either be 32-bit or 64-bit.|
|Data||This field specifies the data encoding. This is important to help processors interpret incoming instructions. The most common data encodings are little-endian and big-endian.|
|Version||Identifies the ELF file version (set to 1)|
|OS/ABI||ABI is short for Application Binary Interface. In this case, it defines how functions and data structures can be accessed in the program.|
|ABI Version||This field specifies the ABI version.|
|Type||The value in this field specifies the object file type. For instance, 2 is for an executable, 3 is for a shared object, and 4 is for a core file.|
|Machine||This specifies the architecture needed for the file.|
|Version||Identifies the object file version.|
|Entry point address||This indicates the address where the program should start executing. In the case that the file is not an executable file, the value in this field is set to 0.|
|Start of program headers||This is the offset on the file where the program headers start.|
|Start of section headers||This is an offset that indicates where the section headers start.|
|Flags||This contains flags for the file.|
|Size of this header||This specifies how big the ELF header is.|
|Size of program header||The value in this field specifies how big an individual program header is.|
|Number of program headers||This indicates how many program headers there are.|
|Size of section headers||The value in this field shows how big an individual section header is.|
|Number of section headers||This indicates how many section headers there are.|
|Section header string table index||The section table index of the entry representing the section name string table|
3.3. Program Header Table
Another part is the Program Header Table. The program header table stores information about segments. Each segment is made up of one or more sections. The kernel uses this information at run time. It tells the kernel how to create the process and map the segments into memory.
To run a program, the kernel loads the ELF header and the program header table into memory. Secondly, it loads the contents that are specified in LOAD in the program header table into memory, and it also checks if the interpreter is needed. Finally, the control is given to the executable itself or the interpreter if it’s available.
We use the readelf command with the -l option to display the program headers of an ELF file:
$ readelf -l /bin/ls Elf file type is DYN (Position-Independent Executable file) Entry point 0x6180 There are 11 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000003538 0x0000000000003538 R 0x1000 LOAD 0x0000000000004000 0x0000000000004000 0x0000000000004000 0x00000000000143c9 0x00000000000143c9 R E 0x1000 LOAD 0x0000000000019000 0x0000000000019000 0x0000000000019000 0x0000000000008ab8 0x0000000000008ab8 R 0x1000 LOAD 0x0000000000022350 0x0000000000023350 0x0000000000023350 0x0000000000001278 0x0000000000002568 RW 0x1000 DYNAMIC 0x0000000000022dd8 0x0000000000023dd8 0x0000000000023dd8 0x00000000000001f0 0x00000000000001f0 RW 0x8 NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4 0x0000000000000044 0x0000000000000044 R 0x4 GNU_EH_FRAME 0x000000000001df0c 0x000000000001df0c 0x000000000001df0c 0x0000000000000944 0x0000000000000944 R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x0000000000022350 0x0000000000023350 0x0000000000023350 0x0000000000000cb0 0x0000000000000cb0 R 0x1 ...
Program headers are essential when running the executable because they tell the operating system all it needs to know to put the executable into memory and run it.
3.4. Section Header Table
The section header stores information about sections. This information is used during dynamic link time, just before the program is executed.
A linker links the binary file with shared libraries that it needs by loading them into memory. The linker’s implementation is specific to the operating system.
Additionally, the section header table contains information that’s used by other files to find the symbolic definitions and references of the program.
We use the readelf command with the -S option to display the information in the section header of a file:
$ readelf -S /bin/ls There are 30 section headers, starting at offset 0x23768: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 00000000000002a8 000002a8 000000000000001c 0000000000000000 A 0 0 1 [ 2] .note.gnu.bu[...] NOTE 00000000000002c4 000002c4 0000000000000024 0000000000000000 A 0 0 4 [ 3] .note.ABI-tag NOTE 00000000000002e8 000002e8 0000000000000020 0000000000000000 A 0 0 4 [ 4] .gnu.hash GNU_HASH 0000000000000308 00000308 00000000000000ac 0000000000000000 A 5 0 8 ...
As we discussed earlier, section headers are significant at link time to link the executable with the libraries it needs to run successfully.
This section holds the instructions that the program needs for it to run.
3.6. .rodata and .data
.rodata stands for read-only data. As such, these sections contain the actual, initialized data, which the program will need in memory. The memory reserves more space for the data segment than specified in the ELF file to make room for uninitialized variables.
In this article, we learned about the ELF file and its structure. We also looked at using readelf to check different parts of an ELF file.