1. Introduction

The Docker build process may take some time to finish. It may download base images, copy files, and download and install packages, just to mention a few common tasks. This is the reason why docker build uses a cache.

In this tutorial, we’ll learn more about the build process and when it’s better to avoid the cache.

2. About the Docker Build Cache

Docker images are built in layers, where each layer is an instruction from a Dockerfile. Layers stack on top of each other, adding functionality incrementally.

Let’s now see a simple Dockerfile to illustrate how the build process works:

FROM alpine:latest
RUN apk add --no-cache bash
ADD entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]

The above Docker image runs a script, entrypoint.sh, that prints the current date and time every minute and sleeps for 60 seconds:

#!/bin/bash

while :
do
  echo $(date)
  sleep 60
done

When we build the image for the first time all four steps are executed, but the next time we build, the process will change:

$ docker build -t print-date-time .
Sending build context to Docker daemon  3.072kB
Step 1/4 : FROM alpine:3.12
 ---> a24bb4013296
Step 2/4 : RUN apk add --no-cache bash
 ---> Using cache
 ---> 52f7aaec5411
Step 3/4 : ADD entrypoint.sh /
 ---> Using cache
 ---> 66ba9eee7c3c
Step 4/4 : ENTRYPOINT ["/entrypoint.sh"]
 ---> Using cache
 ---> 91a39deabc0b
Successfully built 91a39deabc0b
Successfully tagged print-date-time:latest

The build process knew the Dockerfile didn’t change, so it used the cache from the last build for all four layers. If a line had changed, it would have rebuilt the layers from that line onwards.

The Docker build process also checks for changes in files added with the ADD or COPY instructions. In our example, if we had changed the entrypoint.sh script, the layers for steps 3 and 4 would be rebuilt.

This optimization allows us to save time when creating our image, as we’ll probably build and run it many times. But there are times where the cache prevents the image from being updated.

3. When Not to Use the Cache

A Dockerfile may contain instructions to download and install tools. In the previous example, we installed bash. Although not needed to execute scripts, that line serves to illustrate an issue.

Let’s say the Bash package is updated on Alpine Linux to fix a security issue. Our Dockerfile won’t catch the change and therefore won’t rebuild the image.

This is especially true when cloning a Git repository. The git clone command will possibly never change, but the repo will.

The simplest solution to avoid these issues is to just not use the cache at all:

$ docker build -t print-date-time --no-cache .

The no-cache argument will completely discard the cache, always executing all steps of the Dockerfile.

The FROM instruction is the only line that is not affected by the no-cache argument. If the base image is present in the machine, it won’t be pulled again. We can force a new pull by attempting to pull the image again:

$ docker build -t print-date-time --pull .

The pull argument is useful in our example because the latest tag is bound to change often.

Note that while these are helpful tools for local development, we want to be careful of pulling the latest in production environments. In those cases, it’s safer to use a specific tag instead of just selecting latest. That way we can avoid the unpleasant surprise of an unexpected change that breaks the image.

4. Conclusion

In this short tutorial, we learned the Docker build cache is very useful to shorten our image creation time.

We also saw cases where this optimization might play against us. The no-cache and pull arguments are handy in those situations.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments