Skip Package Reinstallation for a Python Docker Image Build

1. Overview

When building Docker images for Python applications, waiting for packages to reinstall on every Docker image build can become frustrating. For instance, in regions with slower internet connections, it can derail productivity. To resolve this and avoid unnecessary package reinstallation, we need to understand layer caching in Docker.

In this tutorial, we’ll first discuss the problem, create a simple project to demonstrate how Docker rebuilds images, and finally, improve our Dockerfile.

2. Problem Statement

We want to prevent the reinstallation of Python packages on every Docker image build when the packages haven’t changed.

2.1. Example Dockerfile

Here’s a Dockerfile structure that leads to redundant installations:

FROM python:3.10-slim

WORKDIR /app
ADD . /app
RUN pip install -r requirements.txt

CMD ["python", "app.py"]

When we make a code change and rebuild the image, Docker re-executes all steps starting from ADD . /app. As a consequence, the cache for pip install -r requirements.txt is invalidated, leading to the reinstallation of all packages — even if they haven’t changed.

The combination of this Docker structure and iterative development results in slower builds because of repeated pip install, wasted bandwidth and CPU resources, and poor caching behavior.

2.2. A Quick Look at Docker Layer Caching

Let’s explore how Docker builds images to understand why separating COPY instructions is crucial.

Docker builds images in layers whereby each command like COPY, RUN, or ADD creates a new layer. If a layer hasn’t changed, Docker reuses the cached version of that layer instead of rebuilding it. However, in case a command’s input changes — for instance, when files are copied — Docker invalidates that layer and all layers after it.

That’s why ADD . /app forces the reinstallation of packages since it includes all the source code and files. Any small change to a source file invalidates the cache for the pip install step, forcing package reinstallation. Even renaming a Python file or updating a comment inside a file can break the cache for subsequent layers.

With this in mind, developers can structure Dockerfiles more intentionally for maximum build efficiency. Thus, the instruction COPY requirements.txt ./ helps Docker avoid unnecessary rebuilds unless the dependency file itself changes.

3. Recreating the Problem

Let’s set up a simple Python project to walk through the problem and its solution:

$ tree flask-demo
flask-demo
├── app.py
├── Dockerfile
└── requirements.txt

0 directories, 3 files

The tree command above displays the structure of our project.

First, let’s create the file app.py:

$ cat app.py
from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello, Docker!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

After this, let’s create the file Dockerfile:

$ cat Dockerfile
FROM python:3.10-slim

WORKDIR /app

# Add source code and requirements in one step
ADD . /app

# Install dependencies
RUN pip install -r requirements.txt

CMD ["python", "app.py"]

Finally, let’s create requirements.txt:

$ cat requirements.txt
flask==2.3.2

Once we create the project files and paste the necessary content, let’s build the image:

$ docker build -t flask-demo .
[+] Building 72.5s (9/9) FINISHED                                                                          docker:desktop-linux
 ...
 => [1/4] FROM docker.io/library/python:3.10-slim@sha256:49454d2bf78a48f217eb25ecbcb4b5face313fea6a6e82706465a6990303ada2 47.0s
 ...
 => [2/4] WORKDIR /app                                                                                                     1.4s
 => [3/4] ADD . /app                                                                                                       1.1s
 => [4/4] RUN pip install -r requirements.txt                                                                             14.6s
 ...

The command above adds the entire directory (requirements.txt and app.py files) into the image and installs the dependencies listed in requirements.txt.

Now, let’s slightly modify app.py, for instance by changing the return string in the / route to “Hello, World!” and rebuilding the image:

$ docker build -t flask-demo .
[+] Building 24.1s (9/9) FINISHED                                                                         docker:desktop-linux
 ...
 => [1/4] FROM docker.io/library/python:3.10-slim@sha256:49454d2bf78a48f217eb25ecbcb4b5face313fea6a6e82706465a6990303ada2 0.0s
 ...
 => CACHED [2/4] WORKDIR /app                                                                                             0.0s
 => [3/4] ADD . /app                                                                                                      1.0s
 => [4/4] RUN pip install -r requirements.txt                                                                            15.3s
 ...

During the image rebuild process, pip install runs again even though requirements.txt hasn’t changed.

To clarify, Docker builds images in steps and usually attempts to reuse previous results to save time. However, when we use ADD . /app, it copies all our files at once. So, even if we modify only one file, Docker assumes the entire directory was updated. Thus, the cache becomes invalid, forcing Docker to repeat all the steps that come after, such as reinstalling packages.

4. Optimizing the Dockerfile With Layer Caching

To prevent unnecessary reinstallations of packages, we need to separate the addition of requirements.txt and the installation step from the rest of the application code. This modification ensures Docker caches the pip install layer as long as requirements.txt doesn’t change.

Let’s begin by optimizing our Dockerfile:

$ cat Dockerfile
FROM python:3.10-slim

WORKDIR /app

# Install dependencies early for caching
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy only the rest of the app after dependencies are installed
COPY . .

CMD ["python", "app.py"]

Here’s a breakdown of the modifications:

COPY requirements.txt ./ — ensures Docker only looks at this single file to determine cache invalidation
RUN pip install — executes only when requirements.txt changes
COPY . . — copies the rest of the application code after dependencies are installed

Now, if we change app.py but not requirements.txt, Docker skips the package installation and uses the cached layer.

4.1. Demonstrating the Optimization

Once we modify the Dockerfile and optimize it, let’s build the image:

$ docker build -t flask-demo .
[+] Building 25.5s (10/10) FINISHED                                                                       docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/python:3.10-slim@sha256:49454d2bf78a48f217eb25ecbcb4b5face313fea6a6e82706465a6990303ada2 0.0s
 ...
 => CACHED [2/5] WORKDIR /app                                                                                             0.1s
 => [3/5] COPY requirements.txt ./                                                                                        0.9s
 => [4/5] RUN pip install --no-cache-dir -r requirements.txt                                                             15.7s
 => [5/5] COPY . .                                                                                                        1.4s
 ...

Running this command shows that pip install is executed.

Next, let’s modify app.py and then rebuild:

$ docker build -t flask-demo .
[+] Building 6.7s (10/10) FINISHED                                                                        docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/python:3.10-slim@sha256:49454d2bf78a48f217eb25ecbcb4b5face313fea6a6e82706465a6990303ada2 0.0s
 ...
 => CACHED [2/5] WORKDIR /app                                                                                             0.0s
 => CACHED [3/5] COPY requirements.txt ./                                                                                 0.0s
 => CACHED [4/5] RUN pip install --no-cache-dir -r requirements.txt                                                       0.0s
 => [5/5] COPY . .                                                                                                        0.9s
 ...

Here, Docker skips the pip install step.

Finally, let’s modify requirements.txt and then rebuild:

$ echo "requests==2.31.0" >> requirements.txt && docker build -t flask-demo .
[+] Building 30.2s (10/10) FINISHED                                                                       docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/python:3.10-slim@sha256:49454d2bf78a48f217eb25ecbcb4b5face313fea6a6e82706465a6990303ada2 0.0s
 ...
 => CACHED [2/5] WORKDIR /app                                                                                             0.0s
 => [3/5] COPY requirements.txt ./                                                                                        0.8s
 => [4/5] RUN pip install --no-cache-dir -r requirements.txt                                                             21.4s
 => [5/5] COPY . .                                                                                                        1.1s
 ...

Due to an update in the requirements.txt file, Docker reruns the pip install step as expected.

Notably, we added –no-cache-dir to pip install:

RUN pip install --no-cache-dir -r requirements.txt

The addition prevents pip from saving downloaded packages, thereby reducing image size. Since Docker already caches the entire layer, –no-cache-dir helps keep the image small without impacting performance.

4.2. Using a .dockerignore File

When building the Docker image, Docker sends the whole project directory to the Docker engine as the build context. Among the files sent, we can find files that aren’t needed in the image such as temporary files or version control data. To avoid this, we can create a .dockerignore file to exclude unnecessary files and directories:

$ cat .dockerignore
__pycache__/
*.pyc
*.pyo
*.pyd
.env
.git

The .dockerignore file instructs Docker to skip these files during the build. From this addition, we get faster build times and a smaller build context, and we also avoid accidental cache invalidation caused by unrelated file changes. Thus, .dockerignore helps us make the Docker builds more efficient.

4.3. Locking Each Dependency to a Specific Version

Caching can still misbehave if requirements.txt changes often — for instance, if requirements.txt contains loosely defined package versions.

To make Docker caching even more effective, we can pin our Python package versions in the requirements.txt file. To clarify, we lock each dependency to a specific version:

$ cat requirements.txt
flask==2.3.2
requests==2.31.0

On the other hand, let’s consider unpinned versions:

$ cat requirements.txt
flask
requests

In this case, if the latest version of a package changes remotely, the image may reinstall everything, even though requirements.txt hasn’t changed.

So, when we pin versions we get more consistent caching and more predictable builds since our application runs with packages of specific versions.

5. Conclusion

In this article, we explored how to avoid unnecessary reinstallation of Python packages when building Docker images. First, we started with a simple Dockerfile that leads to repeated package reinstallations and slow builds. Then, we demonstrated how Docker’s layer caching works and how it can be optimized.

By separating requirements.txt from the rest of the code, we enable Docker to cache the pip install step. As a result, the image build time is reduced during iterative development.

Additionally, we can use a .dockerignore file to reduce build context size and enhance caching reliability. Finally, we can lock each dependency to a specific version for consistent and predictable builds.

Full Archive

About Baeldung