Baeldung Pro – Ops – NPI EA (cat = Baeldung on Ops)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

When containerizing a Node.js application with Docker, we may encounter a situation where each minor change in the source code triggers Docker to rebuild and re-run npm install. For instance, code changes that don’t even affect dependencies may invalidate the cache of the RUN npm install step. Rerunning this step over unnecessary code changes can drastically reduce productivity, especially since npm install is often the longest process in the build.

In this tutorial, we’ll demonstrate the potentially redundant repeated execution of RUN npm install, explore how Docker cache works, and how to optimize a Dockerfile to leverage caching for faster builds. In particular, we’ll demonstrate using a simple Node.js application as an example.

2. Basic Setup

To begin with, let’s understand why the RUN npm install command may rerun without need.

2.1. Sample Dockerfile

To demonstrate, let’s create a sample Dockerfile:

FROM node:20

WORKDIR /app

COPY . .

RUN npm install

CMD ["node", "index.js"]

This fairly minimal setup uses version 20 of the node image, sets up the working directory, installs the necessary modules, and runs the index.ts file via Node.

2.2. Build Process

In particular, when we run the docker build ., command, Docker performs several steps in order:

  1. sets the base image to node:20
  2. sets the working directory to /app
  3. copies everything into /app
  4. runs npm install
  5. sets the command to run index.js

Thus, we can reference the steps by precedence.

2.3. Redundant Reruns

In the Dockerfile, the COPY . . instruction copies all source code, including the files package.json, package-lock.json, and index.js., so if any file in the project changes, Docker considers it a change in the build context and invalidates the cache for the next layers, including RUN npm install. Therefore, Docker reruns npm install every time, even if package.json didn’t change.

Docker uses a layered cache system. In particular, each instruction (like COPY or RUN) creates a new image layer. Docker reuses the layer from cache once it concludes that the inputs to a layer haven’t changed.

Therefore, the RUN npm install instruction needs to depend only on the state of package.json and package-lock.json. To explain, if these files don’t change, Docker can reuse the cached layer. However, if we copy the entire project (COPY . .) before we run npm install, Docker invalidates the cache.

3. Sample Solution

Let’s explore a Node app example.

3.1. Create Project Structure

First, let’s use the commands mkdir, cd, and npm init -y to create the working directory, navigate into it, and create the package.json file respectively:

$ mkdir node-docker-cache && cd node-docker-cache && npm init -y
Wrote to /home/peter/Desktop/BAELDUNG/OPS/node-docker-cache/package.json:

{
  "name": "docker-cache",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC"
}

Next, let’s create the file index.js:

// index.js
const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;

app.get('/', (req, res) => {
  res.send('Hello from Docker!');
});

app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

After that, we install express:

$ npm install express

Now, let’s look at the project structure:

$ ls
index.js  node_modules  package.json  package-lock.json

Above, we use the ls command to display the project files.

3.2. Create an Optimized Dockerfile

Now, we can optimize the Dockerfile we saw earlier:

# Use Node.js base image
FROM node:20

# Create app directory
WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install

# Copy the rest of the app
COPY . .

# Expose port and run
EXPOSE 3000
CMD ["node", "index.js"]

To leverage layer caching, we modify the Dockerfile to copy the package*.json files first, then run npm install, and only after that, copy the rest of the source code.

Now, when we change the source code (index.js) and not package.json, Docker uses the cached npm install layer. Meanwhile, if we update dependencies, Docker detects the changed package*.json files and runs npm install again as expected. Hence, we now get the correct behavior when dependencies change, as well as fast builds during development.

3.3. Build and Run the Container

In this step, let’s proceed to build the image my-node-app and run the container my-node-app:

$ docker build -t my-node-app . && docker run -p 3000:3000 my-node-app
[+] Building 28.4s (10/10) FINISHED                                                                             docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/node:20@sha256:691ef3fccb415741c5f5ecb39cc5f5a9b8122b84c5ffda53cf68f4a4963f45ff                9.6s
 ...
 => [2/5] WORKDIR /app                                                                                                          1.0s
 => [3/5] COPY package*.json ./                                                                                                 1.1s
 => [4/5] RUN npm install                                                                                                       5.4s
 => [5/5] COPY . .                                                                                                              1.4s
...

When we navigate to http://localhost:3000, we see Hello from Docker!.

4. Testing the Cache Behavior

To ensure we get the correct results, let’s demonstrate the cache behavior.

4.1. Modify index.js

To start, we change the response in index.js:

res.send('Hello from Docker cache!');

Once this is done, we rebuild the image:

$ docker build -t my-node-app .
[+] Building 7.2s (10/10) FINISHED                                                                              docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/node:20@sha256:e0e264aeac056ed69fbc71329038713d68bd62ed449ee835280a7e6a5c29e89d                0.6s
 ...
 => CACHED [4/5] RUN npm install                                                                                                0.0s
 => [5/5] COPY . .                                                                                                              1.4s
 ...

The Docker build command uses the existing cache and works as expected.

4.2. Modify package.json

To modify package.json, let’s install a specific version of express:

$ npm install [email protected]

added 6 packages, removed 5 packages, changed 24 packages, and audited 69 packages in 12s
...

Then, let’s rebuild the image again:

$ docker build -t my-node-app .
[+] Building 22.3s (10/10) FINISHED                                                                             docker:desktop-linux
 ...
 => [1/5] FROM docker.io/library/node:20@sha256:e0e264aeac056ed69fbc71329038713d68bd62ed449ee835280a7e6a5c29e89d                0.0s
 ...
 => [4/5] RUN npm install                                                                                                       7.9s
 => [5/5] COPY . .                                                                                                              1.6s
 ...

Here, Docker detects that dependencies may have changed and runs npm install again. This time, the cache is updated.

5. Adding the .dockerignore File

Typically, Docker sends the entire context of the current directory to the Docker daemon when building Docker images. The Docker build context includes everything, even files that are unnecessary to copy into the image, such as node_modules, .git, which can negatively affect caching.

For instance, if node_modules or temporary files change frequently, Docker sees the build context as changed and proceeds to invalidate cache layers. We can create a .dockerignore file to explicitly exclude unwanted files and directories:

node_modules
npm-debug.log
Dockerfile
.dockerignore
.git

Adding the file ensures Docker:

  • avoids unintended cache invalidation when files outside the dependencies change
  • doesn’t copy unnecessary files into the image
  • handles a smaller and more consistent build context

We think of .dockerignore as .gitignore, but for Docker builds. During the image build process, the .dockerignore file instructs Docker what not to consider.

6. Troubleshooting Docker Cache

Let’s look at a few common ways to troubleshoot the Docker cache in general.

6.1. Accidental Build Context Changes

The Docker cache relies heavily on the build context. If a file within this context changes, even one that’s not copied into the image, it can invalidate cache layers:

# .dockerignore
node_modules
.git
*.log
*.env

Above, if we include the correct files, we reduce the chance of unrelated files invalidating the cache. Notably, if we don’t properly populate this file, we might miss important updates.

6.2. Broad COPY Instructions

Using the instruction COPY . . early in the Dockerfile can invalidate all subsequent cache layers, even if the actual dependencies haven’t changed.

COPY . .
RUN npm install

To remedy this situation, we can do partial copies according to the files needed at a particular step:

COPY package*.json ./
RUN npm install
COPY . .

In this case, we added a granular copy of data used by npm install earlier in the build and moved the COPY . . instruction after npm install.

6.3. Lockfile Not Included

If we only copy package.json and not package-lock.json, Docker caching may become unpredictable. Docker may fail to detect subtle dependency changes.

Therefore, we should always copy both files:

COPY package*.json ./

The instruction above copies both package.json and package-lock.json.

6.4. Forcing Rebuilds When Debugging

To ensure the correct behavior without any caching, we can also force a clean rebuild:

$ docker build --no-cache -t my-node-app .

Above, the command disables the cache. Hence, we can use this for debugging or fresh dependency installations, but should avoid it for regular development builds.

6.5. Debugging With –progress=plain

If caching still seems unclear, and we need to verify whether a specific step was cached or rebuilt, we can add the –progress=plain flag when building:

$ docker build --progress=plain -t my-node-app .
...

#4 [1/5] FROM docker.io/library/node:20@sha256:e0e264aeac056ed69fbc71329038713d68bd62ed449ee835280a7e6a5c29e89d
#4 DONE 0.0s

#5 [internal] load build context
#5 ...

#6 [4/5] RUN npm install
#6 CACHED

#7 [2/5] WORKDIR /app
#7 CACHED

#8 [3/5] COPY package*.json ./
#8 CACHED

#9 [5/5] COPY . .
#9 CACHED
...

Above, Docker shows how each layer was handled:

  • CACHED: reused from a previous build
  • DONE: executed anew

This way, we can see detailed logs that show exactly which layers were reused from the cache and the ones that were rebuilt.

7. Conclusion

In this article, we explored the caching of the RUN npm install instruction in the Dockerfile.

Without optimization, containerizing a Node.js app with Docker can slow down a workflow. To be specific, the RUN npm install step can become a major issue when Docker rebuilds it unnecessarily on every change. By understanding how Docker caching works and restructuring the Dockerfile to copy only package*.json before running npm install, we enable much faster build times. Thus, Docker caches the dependency layer properly, only rerunning npm install when our actual dependencies change.

Additionally, we explored using the .dockerignore file. As a result, we can prevent accidental cache invalidation from unnecessary files.