In this tutorial, we’ll talk about memory management in Linux and one very interesting feature: memory overcommitting. We’ll also present a couple of ways we can manipulate its behavior so that our system behaves as expected.
Process memory management is a topic we’ll not discuss in detail here. Instead, we’ll focus on how Linux handles the memory space and what it does whenever it runs out of memory.
2. Memory Management in Linux
Let’s start with how Linux manages memory in the first place. Even if we generally refer to memory, there are two types of memory: physical and virtual memory.
Physical memory is the main memory. This physical memory is usually Random Access Memory (RAM) and only the kernel can directly access it. Moreover, only the physical memory has direct access to the CPU.
Virtual memory is a secondary memory allocated in disk storage. In Linux jargon, this refers to the swap. We can compensate for shortages of physical memory by using some hard drive space, which is usually less scarce than physical memory.
Linux has a subsystem that deals with memory management. It has several tasks, such as handling the virtual memory, allocating memory for the kernel and user spaces, demanding file paging, or mapping files with process addresses.
We can configure the kernel settings for memory at runtime with sysctl. We can also query the filesystem under /proc to retrieve extra information. For example, the directory /proc/sys/vm/ contains information for virtual memory operations.
2.1. Memory Overcommitting
We’ve just mentioned that there is an entity, the Linux memory management subsystem, that handles both physical and virtual memory. It allocates memory for the processes when these request it.
In principle, it seems that if there is no memory (physical or virtual) available, the memory management subsystem should just halt the execution of the process with an error stating that there is not enough memory. However, a process can allocate more memory than is currently available and the Linux memory management subsystem will allow it, which we know as memory overcommit.
The kernel can allow memory overcommitting expecting that the process will use less memory than it has allocated. In general, we see that a given program will request more memory than it ends up using. This is done while programming since there are some programming languages where it’s simpler to implement a growing data structure instead of reallocating it. If this is the case, we might end up allocating way more than what we end up filling if the program is not optimized.
The Linux memory management subsystem allows memory overcommitting to optimize resources. We can run more processes at the same time assuming that the memory they use is less than the memory they allocate.
2.2. Out-Of-Memory Killer
So, now our system has more memory allocated than the total available memory. Again, this is not an issue as long as the memory actually used is less than the total available memory. But processes might end up using all the memory that the memory management subsystem has assigned them.
In that case, the Linux memory management subsystem needs to do something and it invokes the out-of-memory killer.
The purpose of the out-of-memory (OOM) killer is to pick a process and kill it to avoid other, potentially larger, problems. The system only calls the OOM killer when the memory situation is really low, other solutions as flushing caches have failed and all the memory is full.
2.3. OOM Killer Selection Heuristics
There is a set of rules that selects the process that the OOM killer will choose. These rules have been curated over time since every process would like another process to be killed. From the perspective of the user, it’s bothering that the OOM killer picks up one of their processes instead of another. From the system perspective, there are some processes that are way more essential to the functioning of the system than user programs.
Each process has a badness score, which the OOM killer uses to choose between all processes which to kill.
We can check the OOM score under the /proc filesystem for a given process (in this case, the process with a PID of 95538):
$ cat /proc/95538/oom_score 800
The system determines this value based on several factors. The score collects information about the work that the process has done and would be lost, the uptime of the application, or the memory that the system will recover.
Moreover, the OOM killer tries to kill the minimum number of processes (ideally just one). If any process is under a swapoff( ) system call, the OOM killer selects it first – since the system call removes a swap file from the system.
The score of child processes is added to the score of the parent process as long as they don’t share memory, so forked processes are also good candidates to be killed.
We’ve listed only some of them for the sake of completeness, but the actual heuristics are more complex.
2.4. OOM Killer Diagnosis
The OOM killer goes through the scores of the processes and selects the process with the higher oom_score. Then, it kills the process and all the children processes.
One concern against the OOM killer is the loss of stability. If the choice of program is right (imagine an isolated user process), the system can recover. However, if the selected process is more critical, killing it might endanger the stability of the system.
But how can we know that our system is out-of-memory? We can check the output of dmesg to see if the OOM killer has killed any process. Depending on our configuration of the Linux memory management subsystems, there are two possible things we can look for.
If memory overcommitting is enabled, we should look for Out of memory in the output of dmesg:
$ sudo dmesg | grep "Out of memory" -A 1 [1540.913510] Out of memory: Kill process 95538 (java) score 800 or sacrifice child [1540.913515] Killed process 95538 (java) total-vm:1518461kB, anon-rss:622724kB, file-rss:5128kB
We use the -A flag to display one line after the match to get extra information. We can see that the OOM killer picked the process 95538 which had a score of 800.
The total size of the virtual memory is listed under total-vm. Under RSS (which stands for Resident Set Size) we see two values referring to the primary memory: anon-rss are the allocated memory blocks and file-rss are the memory blocks mapped into files.
If we keep memory overcommitting disabled (which we can disable as we’ll see later), we need to look for segfault in the output of dmesg:
$ sudo dmesg | grep "Out of memory" -A 1 [1562.081591] java: segfault at 0 ip 00400559 sp 5bc7b1b0 error 6 in myapp[400000+1000]
Having segfault at 0 means that the process tried to gain access to pointers that it had not initialized. The cause of this might be a failed call to memory allocation.
3. Tweaking the Linux Memory Management Subsystem
We’ll discuss two solutions to tweak the operation of the Linux memory management subsystem. There are other solutions more involved that we can also take, such as shifting the OOM killing policy to the user space. However, they are outside of the scope of this article since they require deeper insights into memory management.
Before discussing the two approaches, if we’re facing OOM events frequently, we can always opt for getting more memory. We can either buy more physical memory or increase the partition size for the virtual memory.
3.1. Changing the Memory Overcommitting Parameters
Under the /proc filesystem, there is one file that controls how the system accounts for the memory. This memory overcommitting parameter is located under /proc/sys/vm/overcommit_memory and can have three values: 0, 1, or 2.
When overcommit_memory = 0 or overcommit_memory =1, we enable memory overcommitting and the system might allocate more memory than what is readily available. However, very large requests might be denied with overcommit_memory = 0.
With overcommit_memory =2, we disable memory overcommitting and protect our system against out-of-memory problems.
By default, most Linux distributions have overcommit_memory = 0.
3.2. Temporary Adjustment of OOM-Killer Heuristics
The overcommit_memory parameter is a general way of changing the whole behavior of the Linux memory management subsystem. However, there is yet another way we can tweak the behavior of the selection heuristics of the OOM killer.
We can adjust whether specific processes will be easier or harder to kill while leaving the overall memory overcommitting settings unaffected. There are several ways to do it.
There’re two parameters inside the /proc filesystem of each process, /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj, that adjust the overall oom_score.
By default, the values for all processes are set to zero (which doesn’t mean that the /proc/<pid>/oom_score value has to be zero):
$ cd /proc/407859 $ grep . oom* oom_adj:0 oom_score:666 oom_score_adj:0
Both parameters have similar effects: the lower the value we set, the lower the possibility that the OOM-killer selects that process. However, their granularity varies.
The old oom_adj can only have values between -17 and +15. The new oom_score_adj can vary between -1000 and +1000. With either oom_adj=-17 or oom_score_adj=-1000, that process won’t be killed.
We can change the value of either oom_adj or oom_score_adj with echo and inspect how this changes the computed oom_score:
$ echo 200 > oom_score_adj $ grep . oom* oom_adj:3 oom_score:800 oom_score_adj:200
Since both parameters control the same settings, changing one will also modify the other. The system will also update the value of the oom_score after changing either value.
We can also know and adjust the oom_score_adj with the choom command. We just need to provide the PID of the process with the -p flag:
$ choom -p 407859 pid 407859's current OOM score: 800 pid 407859's current OOM score adjust value: 200
To modify the value, we provide the PID with the -p flag and the new oom_score_adj value with the -n flag:
$ choom -p 407859 -n 800 pid 407859's OOM score adjust value changed from 200 to 800
With choom we’re simply changing the value in the /proc/<pid>/oom_score_adj as we did with echo.
3.3. Persistent Adjustment of OOM-Killer Heuristics
The changes we did in the previous section are just temporary since they are associated with the PID of the process. If we rerun a process, it will take a new PID and our OOM-killer might select it if we run out of memory. Nevertheless, we can specify in our init daemon that we want to adjust the OOM values.
The exact details depend on the init daemon that we’re using. If we are using systemd, there is a parameter that we can set up in the service file, called OOMScoreAdjust, which automatically sets up the specified value for the executed processes.
If our init daemon is upstart, we can use the oom score value that we can set up for a certain application.
In this article, we’ve talked about memory management in Linux and what overcommitting memory means.
Our system can let processes allocate more memory than is available. This is not a problem as long as they don’t use it all, but if they do, processes might run out of memory. In that case, the Out-Of-Memory killer selects a task to be killed to free some memory space.
We’ve discussed a couple of ways we can tweak the operation to better suit our needs: by changing the overall memory overcommitting parameters or by changing the OOM killer heuristics.