Caching Maven Dependencies with Docker

1. Introduction

In this tutorial, we’ll show how to build Maven projects in Docker. First, we’ll start with a simple, single-module Java project and show how to dockerize the build process, leveraging multistaged builds in Docker. Next, we’ll show how to use Buildkit to cache dependencies between multiple builds. In the end, we’ll cover how to exploit layer cache in multi-module applications.

2. Multi-Staged Layered Build

For this article, we’ll create a simple Java application with Guava as a dependency. We’ll create a fat JAR using the maven-assembly plugin. The code and Maven configuration will be abbreviated from this article as they are not the main topic.

Multi-staged builds are a great way to optimize the Docker build process. They enable us to keep the whole process in a single file and also help us keep the Docker image as small as possible. In the first stage, we’ll run a Maven build and create our fat JAR, and in the second stage, we’ll copy the JAR and define an entry point:

FROM maven:alpine as build
ENV HOME=/usr/app
RUN mkdir -p $HOME
WORKDIR $HOME
ADD . $HOME
RUN mvn package

FROM openjdk:8-jdk-alpine 
COPY --from=build /usr/app/target/single-module-caching-1.0-SNAPSHOT-jar-with-dependencies.jar /app/runner.jar
ENTRYPOINT java -jar /app/runner.jar

This approach lets us keep the final Docker image smaller since it won’t contain Maven executables or our source code.

Let’s create the Docker image:

docker build -t maven-caching .

Next, let’s start a container from the image:

docker run maven-caching

When we change something in the code and re-run the build, we’ll notice that all commands before the Maven package task are cached and executed immediately. Since our code changes more often than project dependencies, we can separate dependency download and code compilation to improve build time using Docker layer cache:

FROM maven:alpine as build
ENV HOME=/usr/app
RUN mkdir -p $HOME
WORKDIR $HOME
ADD pom.xml $HOME
RUN mvn verify --fail-never
ADD . $HOME
RUN mvn package

FROM openjdk:8-jdk-alpine 
COPY --from=build /usr/app/target/single-module-caching-1.0-SNAPSHOT-jar-with-dependencies.jar /app/runner.jar
ENTRYPOINT java -jar /app/runner.jar

Running subsequent builds when we change only our code will be much faster since Docker will fetch layers from the cache.

3. Caching Using BuildKit

Docker version 18.09 introduces BuildKit as an overhaul of the existing build system. The idea behind the overhaul is to improve performance, storage management, and security. We can leverage BuildKit to keep the state between multiple builds. This way, Maven won’t download dependencies every time since we have permanent storage. To enable BuildKit in our Docker installation, we need to edit the daemon.json file:

...
{
"features": {
    "buildkit": true
}}
...

After enabling BuildKit, we can change our Dockerfile to:

FROM maven:alpine as build
ENV HOME=/usr/app
RUN mkdir -p $HOME
WORKDIR $HOME
ADD . $HOME
RUN --mount=type=cache,target=/root/.m2 mvn -f $HOME/pom.xml clean package

FROM openjdk:8-jdk-alpine
COPY --from=build /usr/app/target/single-module-caching-1.0-SNAPSHOT-jar-with-dependencies.jar /app/runner.jar
ENTRYPOINT java -jar /app/runner.jar

When we change the code or the pom.xml file, Docker will always execute ADD and RUN the Maven command. Build time will be the longest on the first run since Maven will have to download dependencies. Subsequent runs will use local dependencies and execute much faster.

This approach requires maintaining Docker volumes as storage for dependencies. Sometimes, we’ll have to force Maven to update our dependencies with the -U flag in Dockerfile.

4. Caching for Multi-Module Maven Projects

In previous sections, we showed how we could leverage different methods to speed up the build time of Docker images for a single-module Maven project. For more complex applications, these methods are not optimal. Multi-module Maven projects usually have one module that is the entry point of our application. One or more modules contain our logic and are listed as dependencies.

Since submodules are listed as dependencies, they will prevent Docker from layer caching and trigger Maven to download all dependencies again. This solution with BuildKit is good in most cases, but as we said, it may require force updates from time to time to fetch updated submodules. To avoid such situations, we can separate our project into layers and use Maven incremental builds:

FROM maven:alpine as build
ENV HOME=/usr/app
RUN mkdir -p $HOME
WORKDIR $HOME

ADD pom.xml $HOME
ADD core/pom.xml $HOME/core/pom.xml
ADD runner/pom.xml $HOME/runner/pom.xml

RUN mvn -pl core verify --fail-never
ADD core $HOME/core
RUN mvn -pl core install
RUN mvn -pl runner verify --fail-never
ADD runner $HOME/runner
RUN mvn -pl core,runner package

FROM openjdk:8-jdk-alpine
COPY --from=build /usr/app/runner/target/runner-0.0.1-SNAPSHOT-jar-with-dependencies.jar /app/runner.jar
ENTRYPOINT java -jar /app/runner.jar

In this Dockerfile, we copy all pom.xml files and incrementally build each submodule, and then we package the whole application in the end. The rule of thumb is that we build submodules that change more frequently later in the chain.

5. Conclusion

In this article, we covered how to build Maven projects using Docker. First, we covered how to exploit layering to cache parts that do not change frequently. Next, we covered how to use BuildKit to keep the state between builds. In the end, we showed how to build multi-module Maven projects with incremental builds. As always, the complete code can be found over on GitHub.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Full Archive

About Baeldung

1. Introduction

2. Multi-Staged Layered Build

3. Caching Using BuildKit

4. Caching for Multi-Module Maven Projects

5. Conclusion