1. Introduction

Executable files are often binary in nature for faster execution. Segments are parts of such files that contain different types of information. Critically, if we attempt to perform an illegal operation on one of the segments, we might end up causing a so-called Segmentation fault error.

In this tutorial, we talk about segmentation errors and how to deal with them. First, we understand segments and how segment faults come about. Next, we write and compile a sample program that causes one. After that, we go through code analysis and understand why different code may lead to a segmentation fault. Then, we perform basic reverse engineering, debugging, and tracing as ways to get a better understanding of the problem and its root cause. Finally, we go over core dumps and bug reports.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. It should work in most POSIX-compliant environments unless otherwise specified.

2. Program Segmentation and Faults

As already explained, segments represent and separate different kinds of data within a binary executable or process. Even with interpreted languages such as Perl and Bash, executing a script still means running the executable file of the interpreter (perl, bash, and others).

Most programs have three or four main segments:

  • data: constants and predeclared variables
  • code: executable code in the form of binary instructions
  • stack: dynamic and temporary data
  • heap: run-time allocations (optional)

Each segment has a special pointer (register) that shows the current position within it:

  • BP (Base Pointer): holds address (offset) within the dynamic or temporary data
  • IP (Instruction Pointer): holds current instruction address (offset)
  • SP (Stack Pointer): holds current stack address (offset)

The pointer base is either at the start of its respective segment (BS Base Segment, CS Code Segment, SS Stack Segment) or somewhere within it (DI, SI). In other words, any current pointer value is usually an integer offset.

Critically, trying to run code within a non-code segment, read code as data, access a non-existent address, or similar action, can lead to a Segmentation fault, which is irrecoverable. In short, this means the program tries to access data outside of the segment where that data resides because a base-pointer combination goes outside the respective segment size. Due to this critical misbehavior, the kernel kills the program.

3. Example 0xbadc0de

To demonstrate, we compile a minimal executable file that causes a segmentation fault.

3.1. Write Code

First, we create a simple C file:

$ cat segfault.c
int main() {
  int *address = 0;
  *address = 666;

  return 0;
}

Here, we have a standard main() function, which [return]s 0 upon success. However, the first two lines of code in the main function are of interest in this case:

  • int *address = 0; declares a * pointer to an [int]eger variable at address 0
  • *address = 666; attempts to assign the value 666 to the address (0) pointed to by the pointer variable address

Now, we can create a binary file from this code.

3.2. Compile Binary Executable

At this point, let’s compile an executable binary from segfault.c via gcc:

$ gcc -O0 -s -nostdlib --entry main segfault.c -o segfault.bin

To minimize the footprint of our resulting file, we use a number of options:

  • -O0: 0 [O]ptimizations, meaning the compiler doesn’t attempt any special handling of the code
  • -s: omit symbol tables and relocation information
  • -nostdlib: no linking to the standard library, since we don’t use any standard library functions or optimizations
  • –entry main: assume the address of main() as the program –entry point

Importantly, we minimize and simplify the final binary executable file for clarity when debugging later.

3.3. Run Binary Executable

After writing and compiling our code, let’s run the resulting executable:

$ ./segfault.bin
Segmentation fault

As expected, we get a Segmentation fault. Basically, the attempt to assign a value to the cell at memory address 0 fails, since it’s outside the bounds of addressable memory. Let’s investigate and show that in practice.

4. Analyze Code

If we write or have access to the source code of the problematic application, we might be able to understand where the problem comes from by looking at that code line by line. Moreover, many languages and their compilers or interpreters anticipate which parts of code may cause a segment fault.

Even if ours doesn’t, we can still make deductions based on a number of common cases.

4.1. Invalid Address Access

In our original example above, a bad pointer value causes an illegal memory access.

Sometimes, a simple syntax mistake can do the same. For example, scanf() expects its second argument to be an address, so it can write to that address.

To pass the address in C, we use the & ampersand symbol. If we fail to prefix the variable, we usually cause a segmentation fault.

4.2. Array Bounds Overflow

One very common issue in programming is the off-by-one error.

While it can manifest itself in different ways, a variable, buffer, or array overflow is often what it leads to:

int main() {
  int array[666];

  for (int i = 0; i <= 666; i++) {
    array[i] = 0;
  }

  return 0;
}

Here, the problem lies in the fact that array can hold values in elements 0665, while the last loop iteration assigns the i variable a value of 666 and uses it as an index to that array. Importantly, C programs don’t usually directly crash due to this, because of the additional byte for the terminating null character.

However, if we go much beyond the allocated space, a segmentation fault is all but guaranteed.

4.3. Bad Memory Allocation

Of course, explicit dynamic memory allocation problems are fairly hard to catch due to the potentially varying values and addresses:

int main() {
  int cell = 1;
  int *number;
  number = (int*) malloc(cell);
  number[1] = 1;

  return 0;
}

Here, any access beyond number[0] is a gamble and may overwrite other data or crash the program. Interestingly, a crash is usually preferable, as rampant data corruption can cause much more damage without warning.

5. Reverse Engineering

If we don’t have access to the program code, we might still be able to take a glimpse at the instructions of the resulting binary executable and make deductions.

For example, let’s use objdump to disassemble (-d) our segfault.bin example file:

$ objdump -d --no-show-raw-insn segfault.bin

segfault.bin:     file format elf64-x86-64


Disassembly of section .text:

0000000000001000 <.text>:
    1000:       push   %rbp
    1001:       mov    %rsp,%rbp
    1004:       movq   $0x0,-0x8(%rbp)
    100c:       mov    -0x8(%rbp),%rax
    1010:       movl   $0x29a,(%rax)
    1016:       mov    $0x0,%eax
    101b:       pop    %rbp
    101c:       ret

When programming in a language like Assembly, we can declare different sections such as .text within our program.

Now, let’s go through the instructions:

  1. push %rbp saves the current base pointer
  2. mov %rsp,%rbp equates the base and stack pointers
  3. movq $0x0,-0x8(%rbp) places 0 ($0x0) at a given address
  4. mov -0x8(%rbp),%rax gets the same 0 and assigns it to %rax (register)
  5. movl $0x29a,(%rax) attempts to place 666 ($0x29a) in the address within %rax (now 0)

At the last instruction, we attempt to access address 0 through %rax and write a value there, which results in a segmentation fault.

Naturally, going through many lines of machine code manually can become quite tedious. So, let’s check some alternatives.

6. Debugging

As usual, to understand the behavior of a given program, we can also use a specialized debugger on it. In this case, we choose the ubiquitous GNU Debugger (GDB).

First, let’s run gdb with segfault.bin:

$ gdb ./segfault.bin
[...]
(No debugging symbols found in ./segfault.bin)
(gdb)

Of course, there are no debugging symbols, since we haven’t included any during compilation. For known packages, we may download them via apt-get source.

Since our executable is custom, we just continue and run the program:

(gdb) run
Starting program: /segfault.bin

Program received signal SIGSEGV, Segmentation fault.
0x0000666666666010 in ?? ()

As expected, we get a Segmentation fault via the SIGSEGV segmentation violation signal. At this point, backtrace could be helpful if we have debugging symbols. However, since we don’t, the output is limited:

(gdb) backtrace
#0  0x0000666666666010 in ?? ()
#1  0x0000000000000001 in ?? ()
#2  0x00007fffffffe6cd in ?? ()
#3  0x0000000000000000 in ?? ()

Instead, let’s continue and set a breakpoint at the location of the instruction that caused the error and rerun the program:

(gdb) break *0x0000666666666010
Breakpoint 1 at 0x666666666010
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /segfault.bin

Breakpoint 1, 0x0000666666666010 in ?? ()
(gdb)

Now, we can inspect the current and the following 4 [i]nstructions via the e[x]amine command and the $pc program counter, i.e, instruction pointer register:

(gdb) x/4i $pc
=> 0x666666666010:      movl   $0x29a,(%rax)
   0x666666666016:      mov    $0x0,%eax
   0x66666666601b:      pop    %rbp
   0x66666666601c:      ret
(gdb)

Here, we see the same instruction as before causing the problem.

Yet, by using GDB, we can skip to the next instruction:

(gdb) (gdb) set $pc = 0x666666666016
(gdb) x/3i $pc
=> 0x666666666016:      mov    $0x0,%eax
   0x66666666601b:      pop    %rbp
   0x66666666601c:      ret
(gdb)

Thus, we avoid the issue now, but may potentially have to consider the consequences further in the code.

Alternatively, we can also change the code in place:

(gdb) set *0x666666666010 = 0x90
(gdb) x/i $pc
=> 0x666666666010:      nop

In this case, we replace the beginning of the instruction with a nop operation. To ensure proper alignment, we may need to create a number of these nop instructions in a so-called NOP sled.

7. Tracing

Usually, with more complex programs, we might be able to understand potential issues by tracing.

7.1. strace

The strace command shows all system calls that a process makes as well as signals that it receives.

Let’s see a basic example:

$ strace whoami
execve("/usr/bin/whoami", ["whoami"], 0x6660dead6a60 /* 15 vars */) = 0
[...]
newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=542, ...}, 0) = 0
[...]
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2356, ...}, AT_EMPTY_PATH) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2356
close(3)
[...]
write(1, "root\n", 5root
)                   = 5
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

In this case, we trace the execution of the whoami command, as it checks the /etc/nsswitch.conf database for reference and reads the relevant line of /etc/passwd.

In case of failures, we can look at the last system calls and deduce a possible cause or at least a good starting point for debugging.

Notably, we don’t see the actual acquisition of the current user.

7.2. ltrace

Similar to strace, ltrace runs a program and follows its execution. Unlike strace, ltrace only looks for library calls.

Let’s check how ltrace behaves with whoami:

$ ltrace whoami
strrchr("whoami", '/')                                        = nil
setlocale(LC_ALL, "")                                         = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")              = "/usr/share/locale"
textdomain("coreutils")                                       = "coreutils"
__cxa_atexit(0x55b019d9a8e0, 0, 0x666019da2188, 0)            = 0
getopt_long(1, 0x7fff666334f8, "", 0x666019da1d20, nil)       = -1
__errno_location()                                            = 0x6660e4a986c0
geteuid()                                                     = 0
getpwuid(0, 0x7fff666334f8, 0x7f6666c752a0, 0x7f6666b70357)   = 0x6660e4c74a00
puts("root"root
)                                                  = 5
[...]
+++ exited (status 0) +++

As expected, we can see that whoami calls the geteuid() to get the real user ID (UID) of the calling process.

Both tracing methods can be very helpful, but rarely show enough information on their own for us to deduce the cause of an error.

8. Core Dump and Analysis

Depending on the settings of our system, we might see a (core dumped) message in addition to the Segmentation fault text:

$ ./segfault.bin
Segmentation fault (core dumped)

Core dumps are a way for the system (kernel) to log information after an irrecoverable error occurs.

8.1. Apport

One common handler of core dumps in Ubuntu is Apport. In fact, it’s often the default:

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E

Here, we see that the core_pattern dump file location setting in the /proc pseudo-filesystem points to apport.

Critically, Apport deals with faulty applications from known packages but usually ignores custom application crashes.

8.2. systemd-coredump

To generate a report for a custom binary executable, we can install the systemd-coredump service from the package with the same name:

$ apt-get install systemd-coredump

Once installed, we just run our program:

$ ./segfault.bin
Segmentation fault (core dumped)

Notably, now we should have the (core dumped) addition to the error output.

So, let’s list the core dumps via the coredumpctl command:

$ coredumpctl list
TIME                           PID UID GID SIG     COREFILE EXE           SIZE
Thu 2024-01-01 01:06:56 EST 106666   0   0 SIGSEGV present  /segfault.bin 7.8K

Then, we can check the actual file in /var/lib/systemd/coredump, the default core dump path of the systemd-coredump service:

$ ls /var/lib/systemd/coredump
'core.segfault\x2ebin.0.d77c46660eabdead01beef77b99674e0.106599.1704071216000000.zst'

Further, we can see more metadata about the dump via its PID number:

$ coredumpctl dump 106666

Notably, the dump is compressed via zstd, so we can output its contents via zstdcat:

$ zstdcat /var/lib/systemd/coredump/core.segfault\x2ebin.0.d77c46660eabdead01beef77b99674e0.106599.1704071216000000.zst

Still, due to the binary nature of most data within, it’s usually better to have specialized software such as GDB or a custom application for analysis.

9. Bug Report

When it comes to official packages and supported applications and code, reporting faults can be vital for the developer and other users.

To aid with our bug report, we can use a tool like Apport:

$ apport-bug <PROGRAM> --save bug-report.txt

Here, PROGRAM is a command or application from an official Ubuntu package.

For instance, let’s generate a bug report for the perl interpreter:

$ apport-bug perl --save bug-perl.txt

*** Collecting problem information

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
...

Now, we can check the resulting bug-perl.txt report:

$ cat bug-perl.txt
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
Date: Thu Jan  4 11:35:16 2024
Dependencies:
 dpkg 1.21.1ubuntu2.2
 gcc-12-base 12.3.0-1ubuntu1~22.04
[...]
 perl 5.34.0-3ubuntu1.3
 perl-base 5.34.0-3ubuntu1.3
 perl-modules-5.34 5.34.0-3ubuntu1.3
 tar 1.34+dfsg-1ubuntu0.1.22.04.2
 zlib1g 1:1.2.11.dfsg-2ubuntu9.2
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2023-06-26 (666 days ago)
InstallationMedia: Ubuntu 22.04.3 LTS "Jammy Jellyfish" - Release amd64 (20230606.2)
Package: perl 5.34.0-3ubuntu1.3
PackageArchitecture: amd64
ProcCpuinfoMinimal:
 processor      : 3
 vendor_id      : GenuineIntel
 cpu family     : 6
 model          : 142
 model name     : Intel(R) Core(TM) i7-6660U CPU @ 6.66GHz
 [...]
ProcEnviron:
 LC_TIME=fi_FI.UTF-8
 LC_MONETARY=fi_FI.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
 LC_ADDRESS=fi_FI.UTF-8
 [...]
ProcVersionSignature: Ubuntu 6.2.0-31.31~22.04.1-generic 6.2.15
[...]
SourcePackage: perl
Tags:  jammy
Uname: Linux 6.2.0-31-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True

With this information, we might be better equipped to understand the environmental impact when debugging a given issue or submitting a report to the developer. We can use the structure above to create custom reports.

10. Summary

In this article, we talked about segmentation faults and what to do when they happen.

In conclusion, although it’s not always possible to fully debug and resolve issues on our own, we can use many tools to perform preliminary checks before potentially submitting a bug report.

Comments are closed on this article!