In this tutorial, we’ll learn a few different methods to execute untrusted processes in a safe, sandboxed environment in Linux.
2. Sandboxing in Linux
Sandbox is a general term for referring to a safe and isolated environment for executing untrusted programs. The isolation is usually achieved by restricting the ability of the process in the environment to access system resources through various controls.
The main goal of sandboxing an application is to prevent a bad process from compromising the entire system. For example, a security researcher wanting to execute malware to study its behavior would run it in a highly restrictive environment. This allows the researcher to learn more about the malware without risking the entire system.
In Linux, there are various means of sandboxing a process. Concretely, the chroot command, the Linux namespaces, resource access controls like seccomp, and the virtual machine are all viable ways to procure a sandbox environment. These methods vary in terms of their working principle and degree of isolation.
It’s important to note that sandboxing is not entirely foolproof. Security researchers have been discovering ways to escape a sandbox environment for all the methods we’ll be discussing. Therefore, we should always exercise caution and always assess if we can tolerate the damage in the event of an escape.
The chroot command is a command-line tool in Linux that alters the apparent root directory for the currently running process and its subsequent child processes. This modified environment is often referred to as a chroot jail.
The idea is that by changing the apparent root directory for a process, we can prevent the process from seeing and accessing the rest of the filesystem. Therefore, in the event of the process malfunctioning, the harm it causes will be limited to the designated directory only.
3.1. The jailkit Command
Creating a chroot jail can be laborious, as we’ll need to manually populate the directory with the necessary executables for our process. The jailkit command-line tool is a more convenient command that uses chroot under the hood for creating a jail environment.
To obtain the command, we can install the jailkit package using our system’s package manager, such as apt:
$ sudo apt-get install -y jailkit
3.2. Creating a Jail Environment with jailkit
We can create a jail using the jk_init sub-command from the jailkit package. The sub-command takes as input the directory we designate as the root directory for the jail environment. Additionally, it takes a varying list of tool names that we want to enable in the jail environment.
For example, we can create a jail at the /home/jail directory and enable utility tools like scp and shells in the jail:
$ sudo jk_init -v -j /home/jail basicshell scp
Importantly, we’ll need root access to initialize the jail. Hence, we’ll need to prefix our jk_init command with the sudo command.
3.3. Jailing the User
Once we’ve created the jail, we can place users on the system to the jail using the jk_jailuser sub-command. For example, we can jail the user dave using jk_jailuser:
$ sudo jk_jailuser -m -j /home/jail dave
The command above specifies the /home/jail directory as the apparent root directory for the user dave. Furthermore, we can check the entry in the /etc/passwd to verify that the login shell for dave is now /usr/sbin/jk_chrootsh. The jk_chrootsh will ensure that the user dave is dropped into jail on each login.
Logging in as the user dave now will place us in the jail. This jail provides a simple sandbox environment that isolates the filesystem from the actual main’s filesystem. Therefore, any damage to the filesystem caused by malicious processes is only limited to the jail’s apparent root files.
Notably, chroot on its own is usually not a reliable sandbox as it only isolates the filesystem. Specifically, the jailed process still uses and sees other host resources like process trees, network stacks, and user IDs as the rest of the system.
Typically, sandbox solutions would combine the chroot functionality with other security measures on Linux to provide a more comprehensive sandbox solution. One of the functionalities that enhances the isolation is the Linux namespace.
4. Linux Namespace
The Linux namespace is a Linux kernel feature that partitions the kernel resources into independent and isolated units. We can then assign these independent resources to different processes, effectively restricting the amount of resources the process can consume.
Container technologies like LXC and Docker container utilize the Linux namespace extensively as their building blocks. Each container instance is essentially an isolated process that consumes various namespace partitioned resources.
Unlike the chroot method, the namespace approach offers more isolation. Specifically, different container instance has their own mount points, process IDs, network stack, inter-process communication sockets, UTS, user IDs, and cgroup.
4.1. Restricting Docker Container Compute Resources
To create a sandbox environment with a Docker container, we can start a Docker container with a pre-built image and drop it into a shell in that environment. Specifically, we can use the docker run command to start an Alpine Linux container:
$ sudo docker run -it --cpus="0.5" --memory=512m alpine sh
The command above starts a Docker container based on the Alpine Linux image and launches an interactive shell that we can interact with. Critically, we’ve also restricted the container’s access to CPU and memory resources using –cpus and –memory options. This can prevent any rogue process in the container from hogging the system resources.
4.2. Disabling Network
One additional step we can take to further isolate the Docker container is to disable its network. To do that, we can pass the –network none flag when we create and run a Docker container:
$ sudo docker run -it --network=none --cpus="0.5" --memory=512m alpine sh
The command above creates an Alpine Linux container that only has a loopback device. Specifically, the container will not be able to connect to the host or the public network. To verify that, we can print the list of network devices:
/ # ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5. Resources Access Control
Another approach for the sandboxing process in Linux is to restrict their access to system resources through policy. When the process requests system resources, the kernel will enforce the restriction by consulting the policy. This is different from the approaches we’ve seen so far, in which we create an isolated set of resources and allow the process to do with it however it likes.
In this camp, there’s the seccomp feature in Linux that puts the process into a restricted state in which only allows a subset of syscalls can be made. In the most restrictive mode, the seccomp only allows the read, write, exit, and sigreturn syscall. When the process attempts to access syscalls that are prohibited, the kernel either logs the attempt or terminates the process with a SIGKILL.
Typically, sandbox solutions use the combination of the Linux namespace and access control to ensure the protection is in-depth. In the event one layer fails, there’s another layer to fall back to for securing the system.
For instance, the Docker container runs with a default seccomp profile that disables 44 system calls. One such syscall is the acct syscall. By disabling the acct syscall, we prevent containers from being able to disable their own resource limits.
We can specify a different seccomp profile to use for our container using the –security-opt seccomp option when creating the container:
$ sudo docker run -it --security-opt seccomp=/path/to/seccomp-profile.json alpine sh
6. Virtual Machine
The virtual machine offers the highest level of isolation compared to the rest of the solution. In contrast to the namespace approach, the virtual machine instances do not share the operating system.
6.1. Sandboxing Insecure Program in Virtual Machine
The idea is that we first create a virtual machine instance that runs a guest operating system on our host. The step-by-step tutorial for creating a virtual machine is outlined in a separate VirtualBox article and QEMU article.
After that, the virtual machine instance provides us with a safe sandbox environment in which we can run the potentially unsafe program. Since the entire operating system is emulated, the process that is running in the virtual machine will not be able to touch the host’s resources. Furthermore, we can simply restore the virtual machine to its original state if a rogue process breaks the virtual machine instance.
The price we pay for higher isolation is a higher compute resource usage, as we need to emulate the entire operating system. Besides that, the start-up and tear-down speed is much longer. Importantly, the virtual machine image is typically bulky in the range of 5 to 20GBs.
In this tutorial, we’ve learned that sandboxing in Linux refers to creating an isolated environment for a process. Then, we’ve looked at the different approaches to achieving isolation, each with its pros and cons.
We’ve seen that the chroot command offers isolation for the filesystem only. Then, the namespace-based solution provides further isolation through various Linux namespaces. Besides that, we’ve also learned an approach for isolation, which defines a policy that restricts access to syscall or resources on the system.
Finally, we’ve learned that the virtual machine-based approach offers the highest isolation as it emulates the entire operating system stack.