Baeldung Pro – Ops – NPI EA (cat = Baeldung on Ops)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (cat=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Introduction

Docker’s layered architecture is crucial in containerization, where each instruction in a Dockerfile creates a new, immutable layer. These layers stack up to form a complete image, each building on the previous one. However, removing files or directories in Docker is more complex than it seems because of this layering system. Deleting a file in a higher layer may hide it from the final image, but it still exists in the lower layers, consuming space and potentially bloating the image.

In this tutorial, we’ll explore strategies to actively remove files and directories across Docker layers, ensuring we fully eliminate them from the final image. By applying these techniques, we can optimize Docker images for both performance and size.

2. Understanding Docker Layers and File System

In Docker, developers build images in layers, where each layer represents a set of file changes or instructions defined in the Dockerfile. When we execute commands like RUN, ADD, or COPY, Docker creates a new layer. These layers stack on top of each other to form a single image. Each new layer contains only the changes made by that specific command, while all the previous layers remain unchanged.

This layered approach allows Docker to reuse layers across multiple images, which helps optimize storage and reduce build times. For example, if two images use the same base layer, Docker pulls that layer only once and reuses it across both images. This is one reason Docker images are so efficient.

3. Common Pitfall: Directory Removal in Docker Layers

One common mistake developers encounter when removing directories across Docker layers stems from how the file paths are handled. Let’s look at a Dockerfile to explain this issue:

FROM alpine
RUN mkdir dir && cd dir && wget http://google.com

In this line, mkdir dir creates a new directory called dir, and cd dir changes the current working directory to dir. Then, wget http://google.com downloads a file into that directory.

RUN rm -rf dir

In the third line, RUN rm -rf dir attempts to remove a directory named dir in the current context. At first glance, this may seem like it should delete the dir directory that was created in the previous command. However, it doesn’t work as expected.

Since we’re already cd ed into dir in the previous command, rm -rf dir tries to remove a non-existent directory. This is because dir is not found inside itself, leading to a deletion failure.

Note: Using the -f flag in rm -rf suppresses errors, which can lead to confusion. If dir doesn’t exist or the command fails for another reason, we don’t receive any indication of the error. Therefore, it’s best to avoid using -f unless necessary, as it can hide real problems and lead to unnoticed failures.

To avoid this problem, let’s discuss some of the approaches.

4. Using Relative Paths for Directory Removal

Using cd with relative paths is a simple yet effective way to manage directory removal in Docker. This method allows us to navigate into directories, perform actions, and clean up afterward, making sure the directories we create don’t bloat the image. By using relative paths, we avoid relying on absolute directory locations, which can make the process flexible when working with different directories inside a Docker image.

Now, let’s create a directory and remove it using the relative path:

RUN mkdir dir && cd dir && wget http://google.com && cd .. && rm -rf dir
  • mkdir dir: Creates a directory named dir.
  • cd dir: Navigates into the newly created directory.
  • wget http://google.com: Downloads a file (in this case, from Google) into the directory.
  • cd ..: Moves back to the parent directory.
  • rm -rf dir: Deletes the dir directory and its contents.

The important message here is that it’s essential to return to the parent directory before attempting removal, as deleting a directory while inside it causes the removal to fail. This method is particularly useful when working with multiple layers in a Docker build.

5. Using Absolute Paths for Directory Removal

Using absolute paths for directory removal in Docker provides precise control, ensuring the correct directory is targeted, regardless of the current working directory. This is particularly useful for handling nested directories or when the working directory is unclear.

Let’s demonstrate the use of absolute paths for directory creation and removal:

RUN mkdir /dir && cd /dir && wget http://google.com && rm -rf /dir 
  • mkdir /dir: Creates a directory named dir at the root (/) level.
  • cd /dir: Navigates into the newly created /dir directory.
  • wget http://google.com: Downloads a file into the /dir directory.
  • rm -rf /dir: Removes the entire /dir directory and its contents using the absolute path.

In this case, absolute paths, such as /dir, ensure the correct directory is targeted regardless of the current working directory. Unlike relative paths, there’s no need to cd back to the parent directory before removing the directory, since the absolute path is specified.

6. Leveraging the WORKDIR Command

The WORKDIR command simplifies managing the current working directory in Dockerfiles. By setting the working directory globally, we eliminate the need for multiple cd commands, making our Dockerfiles cleaner and more readable. This command is especially useful when we’re performing multiple operations in different directories and want to ensure consistent context throughout.
Let’s use the WORKDIR command to handle the directory removal issue:

FROM alpine
RUN mkdir dir

First, we create a directory named dir. This directory is used for file operations in subsequent steps.

WORKDIR dir
RUN wget http://google.com

Here, we set dir as the working directory using the WORKDIR command. All subsequent commands now run inside this directory. For instance, wget downloads the file within the dir folder, saving us from manually navigating there with cd.

WORKDIR /
RUN ls

Now, we switch the working directory back to the root (/) with another WORKDIR command. We can use ls to verify the contents of the root directory, ensuring we’ve exited the dir directory.

RUN rm -r dir

Finally, we clean up by removing the dir directory from the root using the command.

By using WORKDIR, we efficiently manage directory context throughout our Dockerfile. It reduces the chance of mistakes, especially when switching between directories, and ensures that the commands are executed in the intended locations.

7. Leveraging Docker VOLUMEs

One effective way to remove files and directories from Docker images without affecting image layers is by using volumes. Docker volumes are specialized file systems that exist outside of the image layers, allowing us to move files or directories into a volume and remove them from the image itself. This keeps the image lean and prevents unnecessary files from persisting in layers.

Now, let’s use volume to effectively manage file or directory removal from a Docker image:

FROM alpine
RUN mkdir dir
VOLUME /vol

First, we create a directory named dir. Then, we define a volume with the VOLUME /vol command, creating a special file system outside of the image layers. This volume can store files we want to move out of the image.

RUN cp -r dir /vol

We copy the contents of the dir directory into the volume. By doing this, the directory is stored in the volume but not included in the image layers.

RUN rm -rf dir

Next, we remove the dir directory from the image. Since we already moved the data in the volume, we can safely delete it without bloating the image.

The advantage of this approach is that the dir directory no longer exists in the final image, but it remains accessible in the volume if needed. This helps maintain a clean, minimal image while ensuring temporary files don’t bloat the build.

8. Conclusion

In this article, we explored various strategies for managing files and directories across Docker layers, which is crucial for maintaining lightweight and efficient images.

By utilizing relative and absolute paths, the WORKDIR command, or VOLUME commands, we can precisely control which files remain in each layer. Each method offers unique advantages, from simplicity to flexibility in handling temporary data. Incorporating these techniques into our Docker builds not only helps reduce image size but also improves build speed and deployment efficiency.

Understanding Docker’s layered filesystem is key to mastering image optimization and is an essential practice in DevOps.