1. Introduction

Containers are just a concept for isolating a part of the operating system (OS) as a separate environment. Docker takes the idea of containers and turns them into a manageable unit with both metadata and data associations. These units are Docker images: custom static files with all necessary components to get the container up and running around a given application.

In this tutorial, we explore what comprises a Docker image, as well as how we can generate, explore, and compare images via dive and container-diff. First, we go through the structure of a Docker container image and build a custom one. Next, we turn to dive and its functionality with practical examples. Finally, we go over container-diff for image analysis and comparison.

We tested the code on Debian 12 (Bookworm) with GNU Bash 5.2.15 and Docker 20.10.24. Unless otherwise specified, it should work in most POSIX-compliant environments.

2. Docker Container Image

Container images comprise all the information required to run an isolated environment for an application:

  • code
  • libraries
  • dependencies
  • runtime

Organizations like the Open Container Initiative (OCI) and the Cloud-Native Computing Foundation (CNCF) attempt to specify the format and features for containers as open standards.

Since Docker stands behind OCI, we mainly concern ourselves with the image-spec specification, which defines the format for Docker images. Critically, this data isn’t filesystem-agnostic when it comes to Docker.

2.1. Layered Structure

In fact, Docker defines sets of files in read-only filesystem layers and stores them within a Docker image. This happens via three elements:

  • manifest (JSON): high-level manifest that points to more specific manifests, describing the image and each layer
  • configuration (JSON): metadata, root filesystem differences, and history of image build
  • layer set: actual data

Notably, the first layer is usually a minimal base called the parent. Often, we use a ready-made open-source image such as debian, ubuntu, alpine, and others already available on sites like DockerHub.

Similar to virtual machine snapshots or the journal mechanism of native Linux filesystems, each following layer represents a modification to that base layer. For example, we might want to run several steps:

  1. get parent (base) image as start of new image
  2. create a new path in an image
  3. copy host data to image
  4. download package updates to the image
  5. perform installation
  6. set the main executable

Let’s convert the steps to a Dockerfile:

$ cat Dockerfile
FROM debian:latest
RUN mkdir --parents /home/baeldung/
COPY file /home/baeldung/file
RUN apt-get update
RUN apt-get install -y vim
CMD ["vim", "--version"]

Here, we can see all six steps. Notably, each step creates a layer. Some layers, such as the command history, are temporary, while most that cause filesystem differences remain as separate instances. When the time comes to start the container, a filesystem overlay seamlessly enables writing to all layers.

2.2. Optimization

As an example, every RUN command operates within a separate environment and generates a new layer.

Because of this, we can optimize a Dockerfile for speed and size of the final image:

$ cat Dockerfile
FROM debian:latest
RUN mkdir --parents /home/baeldung/ && \
    apt-get update && \
    apt-get install -y vim
COPY file /home/baeldung/file
CMD ["vim", "--version"]

Now, we generate only three layers and decrease the execution time since the context of the single RUN command is the same for all shell operations.

2.3. Building Image

Let’s build the image as repox:tax:

$ docker build . --tag repox:tax
[+] Building 11.2s (10/10) FINISHED
 => [internal] load .dockerignore                                                                0.0s
 => => transferring context: 2B                                                                  0.0s
 => [internal] load build definition from Dockerfile                                             0.0s
 => => transferring dockerfile: 181B                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:latest                                 1.9s
 => [1/5] FROM docker.io/library/debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90  3.5s
 => => resolve docker.io/library/debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90  0.0s
 => => sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9 1.85kB / 1.85kB   0.0s
 => => sha256:f46a268570dff2bdf3b243362802f0e60f511fc396f134952cb1458bd2b2f40c 529B / 529B       0.0s
 => => sha256:e3cbd207d8e55effc51a4738ed80bd81141d0f50c91bd83f9b18d404c129a8a1 1.46kB / 1.46kB   0.0s
 => => sha256:6a299ae9cfd996c1149a699d36cdaa76fa332c8e9d66d6678fa9a231d9ead04 49.58MB / 49.58MB  1.2s
 => => extracting sha256:6a299ae9cfd996c1149a699d36cdaa76fa332c8e9d66d6678fa9a231d9ead04c        2.2s
 => [internal] load build context                                                                0.0s
 => => transferring context: 25B                                                                 0.0s
 => [2/5] RUN mkdir -p /home/baeldung/                                                           0.4s
 => [3/5] RUN apt-get update                                                                     2.2s
 => [4/5] RUN apt-get install -y vim                                                             2.7s
 => [5/5] COPY file /home/baeldung/file                                                          0.0s
 => exporting to image                                                                           0.3s
 => => exporting layers                                                                          0.3s
 => => writing image sha256:666d0107200d83915f9674066600f1b70d9ef61bcd7badde107b4901e351e3e5     0.0s
 => => naming to docker.io/library/repox:tax                                                     0.0s

The build seems to have succeeded and we can see the steps. Notably, many of them spawn a new temporary container to be modified.

2.4. Basic Parent Comparison

At this point, we can check the images we have:

$ docker image list
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
repox        tax       666d0107200d   1 minute ago     178MB

Normally, one of the main aims when creating base or parent images is to keep the footprint minimal, while ensuring full functionality.

So, let’s pull the parent image debian:latest separately:

$ docker pull debian:latest
latest: Pulling from library/debian
6adebae9cfd9: Already exists
Digest: sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9
Status: Downloaded newer image for debian:latest
docker.io/library/debian:latest

Next, we compare the sizes:

$ docker image list
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
repox        tax       666d0107200d   2 minutes ago    178MB
debian       latest    e3cbd207d8e5   1 minute ago     117MB

As it turns out, our new image is around 60MB bigger than the original due to the repository information, the new file, and Vi installation.

2.5. Basic Inspection

By using docker inspect, we can get a general overview of an image via its identifier

$ docker inspect e3cbd207d8e5
[
  {
    "Id": "sha256:e3cbd207d8e55effc51a4738ed80bd81141d0f50c91bd83f9b18d404c129a8a1",
    "RepoTags": [
      "debian:latest"
    ],
    "RepoDigests": [
      "debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9"
    ],
    "Parent": "",
    "Comment": "",
    "Created": "2024-01-31T01:31:24.460285844Z",
    "Container": "664651423fa834f57c458239fdae8b80dc09eda542e7e1362aef7a1fb50a2fec",
    "ContainerConfig": {
      [...]
    },
    "DockerVersion": "20.10.23",
    "Author": "",
    "Config": {
      [...]
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Size": 116551795,
    "VirtualSize": 116551795,
    "GraphDriver": {
      "Data": {
        "MergedDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/merged",
        "UpperDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/diff",
        "WorkDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/work"
      },
      "Name": "overlay2"
    },
    "RootFS": {
      "Type": "layers",
      "Layers": [
        "sha256:1dae5147cd293b16e7b8c93f778dbf7ceff5c81c2b2704d3e5a98d331cdbe0ab"
      ]
    },
    "Metadata": {
      "LastTagTime": "0001-01-01T00:00:00Z"
    }
  }
]

Along with the layers, we see size, metadata, paths, and more.

3. dive

Because of the specifics around Docker and OCI images, we might want to explore a given image more thoroughly and see what it contains. Although docker inspect can be helpful in this regard, there are lower-level tools for the purpose such as dive.

In terms of inspection capabilities, the latter is comparable to skopeo. However, dive provides a piece of information that other tools don’t: image optimization potential.

3.1. Install

When working on Debian-based distributions, we can get the dive DEB file for installation via dpkg:

$ DIVE_VERSION=$(curl --silent 'https://api.github.com/repos/wagoodman/dive/releases/latest' | perl -n0we 'print $1 if /"tag_name": "v(.*?)"/;')
$ curl --location --output dive.deb "https://github.com/wagoodman/dive/releases/latest/download/dive_${DIVE_VERSION}_linux_amd64.deb"

Here, we use curl to get the latest dive.deb after parsing the GitHub API releases page with a Perl one-liner for the exact version. We can run similar commands to get the latest RPM file.

For a universal installation method, we can also use snap:

$ snap install dive

Importantly, if we choose this method, we might need to use –classic for writing report files back to the filesystem.

Further, we could also clone the git repository and install it from the sources.

Finally, there’s a public dive container image, so we can just use docker itself:

$ alias dive='docker run --rm --tty --interactive --volume=/var/run/docker.sock:/var/run/docker.sock wagoodman/dive'

Effectively, this method of deployment creates a temporary container from the wagoodman/dive image and maps the Docker socket as a –volume from and to /var/run/docker.sock. Then, we can use the new alias to build or directly inspect any image.

3.2. Basic Usage and Navigation

First, let’s again list the current images we have:

$ docker image list
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
repox        tax       666d0107200d   2 minutes ago    178MB
debian       latest    e3cbd207d8e5   1 minute ago     117MB

Next, we can fire up dive with the debian base image identifier:

$ dive e3cbd207d8e5
Image Source: docker://e3cbd207d8e5
Fetching image... (this can take a while for large images)
Analyzing image...
Building cache...
[...]

At this point, we should see a basic tmux-like terminal user interface (TUI) with several panes:

                                                   │ Current Layer Contents ├────────────────────────
┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━    ├── bin → usr/bin
Cmp   Size  Command                                ├── boot
    116 MB  FROM e8391b19d63a54a                   ├── dev
                                                   ├── etc
                                                   │   ├── .pwd.lock
                                                   │   ├── adduser.conf
                                                   │   ├── alternatives
                                                   [...]
│ Layer Details ├─────────────────────────────────    │   │   ├── rmt.8.gz → /usr/share/man/man8/rmt-tar
                                                   │   │   ├── which → /usr/bin/which.debianutils
Tags:   (unavailable)                              │   │   ├── which.1.gz → /usr/share/man/man1/which
Id:     e8391b19d63a54a23f580bd888d66669d876dd261e │   │   ├── which.de1.gz → /usr/share/man/de/man1/
71f257008afb3c4d213d1d                             │   │   ├── which.es1.gz → /usr/share/man/es/man1/
Digest: sha256:1dae5147cd293b16e7b8c93f778dbf7ceff │   │   ├── which.fr1.gz → /usr/share/man/fr/man1/
5c81c2b2704d3e5a98d331cdbe0ab                      │   │   ├── which.it1.gz → /usr/share/man/it/man1/
Command:                                           │   │   ├── which.ja1.gz → /usr/share/man/ja/man1/
#(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528 │   │   ├── which.pl1.gz → /usr/share/man/pl/man1/
a142872692072ab1cd1ad275b67d1f in /                │   │   └── which.sl1.gz → /usr/share/man/sl/man1/
                                                   │   ├── apt
                                                   │   │   ├── apt.conf.d
                                                   [...]
│ Image Details ├─────────────────────────────────    │   │   │   ├── docker-gzip-indexes
                                                   │   │   │   └── docker-no-languages
Image name: e3cbd207d8e5                           │   │   ├── auth.conf.d
Total Image size: 116 MB                           │   │   ├── keyrings
Potential wasted space: 0 B                        │   │   ├── preferences.d
Image efficiency score: 100 %                      │   │   ├── sources.list.d
                                                   │   │   │   └── debian.sources
Count   Total Space  Path                          │   │   └── trusted.gpg.d
                                                   │   │       ├── debian-archive-bookworm-automatic.
                                                   [...] 
 ^C Quit | Tab Switch view | ^F Filter | ^L Show layer changes | ^A Show aggregated changes |    x   |

In this view, there’s data on the number and sources of Layers, Layer details for the current selection, and general Image Details.

We can use the arrow keys to navigate within a view (vertical split). To switch views, we hit Tab. The Space key expands and collapses directories in the Current Layer Contents view.

3.3. Non-interactive Output

If we have write access to the host filesystem, we can also leverage the –json data export feature:

$ dive --json debian_image_data.json e3cbd207d8e5
$ cat debian_image_data.json
{
  "layer": [
    {
      "index": 0,
      "id": "e8391b19d63a54a23f580bd888d66669d876dd261e71f257008afb3c4d213d1d",
      "digestId": "sha256:1dae5147cd293b16e7b8c93f778dbf7ceff5c81c2b2704d3e5a98d331cdbe0ab",
      "sizeBytes": 116542893,
      "command": "#(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528a142872692072ab1cd1ad275b67d1f in / "
    }
  ],
  "image": {
    "sizeBytes": 116542893,
    "inefficientBytes": 0,
    "efficiencyScore": 1,
    "fileReference": []
  }
}

Although much more concise, we can still see some general information. Yet, this is much less than the docker inspect output.

Notably, we can also get a summary screen without interaction via the CI=true environment variable value:

$ CI=true dive e3cbd207d8e5
  Using default CI config
Image Source: docker://e3cbd207d8e5
Fetching image... (this can take a while for large images)
Analyzing image...
  efficiency: 100.0000 %
  wastedBytes: 0 bytes (0 B)
  userWastedPercent: NaN %
Inefficient Files:
Count  Wasted Space  File Path
None
Results:
  PASS: highestUserWastedPercent
  SKIP: highestWastedBytes: rule disabled
  PASS: lowestEfficiency
Result:PASS [Total:3] [Passed:2] [Failed:0] [Warn:0] [Skipped:1]

In this case, we see a 100.0000 % efficiency, meaning no data within the image seems unnecessary. This is exactly as expected for a parent image.

3.4. Custom Image Analysis

So, let’s open our custom-built image and check the basic details in comparison to its base:

$ dive docker://666d0107200d

Looking at the Layers pane, we can immediately notice that, unlike its parent, the custom image has more than one layer:

┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cmp   Size  Command
    116 MB  FROM b1d04cf5ba033d6
    0 B     RUN /bin/sh -c mkdir -p /home/baeldung/ # buildkit
    19 MB   RUN /bin/sh -c apt-get update # buildkit
    42 MB   RUN /bin/sh -c apt-get install -y vim # buildkit
    0 B     COPY file /home/baeldung/file # buildkit

Again, each layer represents a filesystem change. This is why we don’t see the metadata manifest change from CMD [“vim”, “–version”].

Naturally, we also see a decrease in the Image efficiency score:

│ Image Details ├──────────────────────────────────────

Image name: 666d0107200d
Total Image size: 178 MB
Potential wasted space: 1.9 MB
Image efficiency score: 99 %

Count   Total Space  Path
    2        1.5 MB  /var/cache/debconf/templates.dat
    2        158 kB  /var/lib/dpkg/status-old
    2        158 kB  /var/lib/dpkg/status
    2         11 kB  /var/lib/apt/extended_states
    2        9.2 kB  /etc/ld.so.cache
    2        9.0 kB  /var/log/apt/eipp.log.xz
    2        8.8 kB  /var/cache/debconf/config.dat

As expected, most of the files we might not expect to see just contain leftover cache data from the package installation.

3.5. Key Shortcuts

To work with the interactive interface of dive, we can also use several hotkeys:

  • exit: Ctrl+C, Q
  • file filter: Ctrl+F
  • aggregate layer view: Ctrl+A
  • current layer view: Ctrl+L
  • toggle showing [A]dded, [R]emoved, [M]odified, [U]nmodified files: Ctrl+<LETTER>

This way, we can navigate the data more conveniently.

3.6. Configuration

Finally, although we usually won’t need it, dive can also be configured via several files:

  • $XDG_CONFIG_HOME/dive/*.y[a]ml
  • $XDG_CONFIG_DIRS/dive/*.y[a]ml
  • $HOME/.config/dive/*.y[a]ml
  • $HOME/.dive.y[a]ml

Let’s see an annotated example configuration:

$ cat $HOME/.dive.yaml
# "docker" or "podman"
container-engine: docker
# analyze despite errors
ignore-errors: false
log:
  enabled: true
  path: ./dive.log
  level: info

# key changes
keybinding:
  # Global bindings
  quit: ctrl+c
  toggle-view: tab
  filter-files: ctrl+f, ctrl+slash

  # Layer view specific bindings
  compare-all: ctrl+a
  compare-layer: ctrl+l

  # File view specific bindings
  toggle-collapse-dir: space
  toggle-collapse-all-dir: ctrl+space
  toggle-added-files: ctrl+a
  toggle-removed-files: ctrl+r
  toggle-modified-files: ctrl+m
  toggle-unmodified-files: ctrl+u
  toggle-filetree-attributes: ctrl+b
  page-up: pgup
  page-down: pgdn

diff:
  # change default files shown for diff
  hide:
    - added
    - removed
    - modified
    - unmodified

filetree:
  # default collaps state
  collapse-dir: false

  # proportion between vertical view widths
  pane-width: 0.5

  # file attribute toggling
  show-attributes: true

layer:
  # show aggregate layer changes by default
  show-aggregated-changes: false

Thus, we can also change the keyboard shortcuts along with the interface behavior.

4. container-diff

When creating different container images, there are times when we might also want to compare them. For example, root filesystem, layer, and other changes sometimes play a role in deployments.

For this purpose, the container-diff tool can be invaluable.

4.1. Install

Before using container-diff, we install it via an official channel:

$ curl --location --remote-name 'https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64' && \
install container-diff-linux-amd64 /usr/local/bin/container-diff

If install isn’t available, we can often just copy or use the binary directly. In any case, we should have access to the main executable.

4.2. Differences

With container-diff, we can perform comparisons and analysis for many category [–type]s:

  • history
  • metadata
  • layer
  • [file]system
  • size, sizelayer
  • apt, aptlayer
  • rpm, rpmlayer
  • node
  • pip

To perform a difference check, we use the diff subcommand and supply both images as local daemon repository:tag strings. Notably, we first specify the base image, so we get an idea of what was added or changed.

For example, let’s perform a basic filesystem comparison between the images we already have:

$ container-diff diff --type file daemon://debian:latest daemon://repox:tax

-----File-----

These entries have been added to debian:latest:
FILE
          SIZE
/etc/alternatives/editor
          18B
/etc/alternatives/editor.1.gz
[...]

These entries have been deleted from debian:latest: None

These entries have been changed between debian:latest and repox:tax:
FILE                                SIZE1        SIZE2
/var/lib/dpkg/status                75K          79.2K
/var/lib/dpkg/status-old            75K          79.2K
/var/lib/apt/extended_states        5K           5.3K
/etc/ld.so.cache                    4.4K         4.5K
/var/log/apt/eipp.log.xz            4.3K         4.5K
/var/lib/dpkg/diversions            98B          268B
/var/lib/dpkg/diversions-old        29B          187B

The output can become quite long due to the repository updates and archive operations.

Of course, we can use the –type option multiple times and specify other categories as well.

4.3. Analysis

On the other hand, we can get a complete analysis for a single container image via the analyze subcommand:

$ container-diff analyze --type history daemon://repox:tax

-----History-----

Analysis for repox:tax:
-/bin/sh -c #(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528a142872692072ab1cd1ad275b67d1f in /
-/bin/sh -c #(nop)  CMD ["bash"]
-RUN /bin/sh -c mkdir -p /home/baeldung/ # buildkit
-RUN /bin/sh -c apt-get update # buildkit
-RUN /bin/sh -c apt-get install -y vim # buildkit
-COPY file /home/baeldung/file # buildkit
-CMD ["vim" "--version"]

In this case, we check the history of the repox:tax image to reconstruct the build instructions.

5. Summary

In this article, we talked about two major utilities for container image analysis and comparison.

In conclusion, working with containers inevitably requires knowledge of the image structure and ways to check and compare contents for analysis, optimization, and security.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.