1. Overview

Docker images should be minimal in size while containing all the necessary binaries and packages. This helps in efficient resource utilization, faster deployments, low bandwidth utilization, and cost savings. So, we must continuously watch the size of our Docker images.

In this tutorial, we’ll learn how to sort Docker images by size using the docker images command to help manage them efficiently.

2. Scenario Setup

In this section, we’ll prepare a few Docker images of varying sizes to simulate our scenario.

2.1. Pre-Built Docker Images

The most convenient approach to have a set of Docker images is to pull the pre-built images from a public Docker registry such as Docker Hub. For this purpose, let’s write the docker_pull_images.sh script to pull multiple images conveniently:

$ cat docker_pull_images.sh
#!/bin/bash
images=(nginx:latest alpine:latest ubuntu:latest httpd:latest)
for image in "${images[@]}"
do
    docker pull ${image}
done

We’ve specified multiple Docker images in the images array and used the docker pull command to pull those images iteratively.

Now, let’s execute the script and pull the images:

$ ./docker_pull_images.sh

Lastly, we can verify that we’ve got the specified images locally using the docker images command:

$ docker images
REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
alpine       latest    3cc203321400   7 days ago    7.66MB
ubuntu       latest    bf9e5b7213b4   10 days ago   69.2MB
nginx        latest    2a4fbb36e966   2 weeks ago   192MB
httpd        latest    ce6083df2933   2 weeks ago   195MB

It’s important to note that the default behavior of the docker images command is to sort the images based on the image creation timestamp.

2.2. Custom Large Docker Image

The docker images command shows the image size in a human-readable format. Further, most publicly available Docker images will be optimized for size. So, we’ll see most of them will usually have a size unit of MB.

However, we want to ensure that our solution works for sizes in different units, especially in GBs. For this purpose, let’s write a custom Dockerfile using the ubuntu:latest as a base image:

$ cat Dockerfile
FROM ubuntu:latest
RUN dd if=/dev/urandom of=/bloatfile.isbloat bs=1M count=4k

We used the dd command to generate 4GB of random data into the /bloatfile.isbloat file. For this purpose, we used the special file, /dev/urandom, as a pseudo-random data source.

Now, we can use the docker build command to build this image with customlargeimage name and latest tag:

$ docker build . -t customlargeimage:latest

Finally, we can confirm that we’ve got all the Docker images that we need to simulate our use case:

$ docker images
REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
customlargeimage   latest    b237404065c7   2 minutes ago   4.36GB
alpine             latest    3cc203321400   8 days ago      7.66MB
ubuntu             latest    bf9e5b7213b4   11 days ago     69.2MB
nginx              latest    2a4fbb36e966   2 weeks ago     192MB
httpd              latest    ce6083df2933   2 weeks ago     195MB

Since we just built the customlargeimage image, it comes first in the order based on the most recent creation timestamp, even though its size is not the least.

3. Using sort -h

In this section, we’ll use the sort command-line utility in combination with the docker images command to solve our use case of sorting the images by size.

3.1. -n vs. -h Option

The sort command has two options, namely, -n and -h, to sort data numerically. Further, the -n option seems more intuitive when sorting numerical fields. So, it’s crucial to understand why we need to use the -h option over the -n option for our use case.

Let’s sort two sizes using the -n option:

$ echo -e "7MB\n1GB" | sort -n
1GB
7MB

We notice that 1GB appears before 7MB, even though 1GB > 7MB. It’s because the sort -n command only considers the numerical prefix of a string for sorting. In this case, 1GB has a prefix of 1, while 7MB has a prefix of 7.

To overcome this issue, we must ensure that we’re using the -h option to perform a human-readable sort:

$ echo -e "7MB\n1GB" | sort -h
7MB
1GB

Fantastic! The sorting looks correct now because the -h option considered the suffixes denoting the size while sorting.

3.2. With awk and cut

We intend to use the sort -h command for sorting the images by size. So, we need to get the size as the prefix for each output line. So, let’s use awk to duplicate the last column and put it at the front:

$ docker images | awk '{print $NF, $0}'
SIZE REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
4.36GB customlargeimage   latest    b237404065c7   2 minutes ago   4.36GB
7.66MB alpine             latest    3cc203321400   8 days ago      7.66MB
69.2MB ubuntu             latest    bf9e5b7213b4   11 days ago     69.2MB
192MB nginx              latest    2a4fbb36e966   2 weeks ago     192MB
195MB httpd              latest    ce6083df2933   2 weeks ago     195MB

We refer to the last field with $NF in-built variable and the entire input record by $0.

Now, we can pipe the output to the sort -h command for sorting:

$ docker images | awk '{print $NF, $0}' | sort -h
SIZE REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
7.66MB alpine             latest    3cc203321400   8 days ago      7.66MB
69.2MB ubuntu             latest    bf9e5b7213b4   11 days ago     69.2MB
192MB nginx              latest    2a4fbb36e966   2 weeks ago     192MB
195MB httpd              latest    ce6083df2933   2 weeks ago     195MB
4.36GB customlargeimage   latest    b237404065c7   3 minutes ago   4.36GB

We’ve sorted the images by size. However, the first column is no longer required.

Finally, we can pipe the output to the cut command to remove the redundant first column by using the -f2- option:

$ docker images  | awk '{print $NF, $0}' | sort -h | cut -f2- -d' '
REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
alpine             latest    3cc203321400   8 days ago      7.66MB
ubuntu             latest    bf9e5b7213b4   11 days ago     69.2MB
nginx              latest    2a4fbb36e966   2 weeks ago     192MB
httpd              latest    ce6083df2933   2 weeks ago     195MB
customlargeimage   latest    b237404065c7   4 minutes ago   4.36GB

Great! It looks like we nailed this one.

4. With the –format Option

In this section, we’ll still need the sort -h command for sorting. However, we’ll use the –format option available with the docker images command to filter and format the output.

4.1. Understanding the Fields

Docker supports Go templates with the –format option. Further, we also get basic functionality, such as formatting the output as JSON objects, out of the box.

Let’s use the json function to see the template fields that are available for use:

$ docker image list --format "{{json . }}"
{"Containers":"N/A","CreatedAt":"2023-10-07 04:43:07 +0530 IST","CreatedSince":"5 minutes ago","Digest":"\u003cnone\u003e","ID":"b237404065c7","Repository":"customlargeimage","SharedSize":"N/A","Size":"4.36GB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"4.364GB"}
{"Containers":"N/A","CreatedAt":"2023-09-29 02:09:34 +0530 IST","CreatedSince":"8 days ago","Digest":"\u003cnone\u003e","ID":"3cc203321400","Repository":"alpine","SharedSize":"N/A","Size":"7.66MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"7.66MB"}
{"Containers":"N/A","CreatedAt":"2023-09-25 15:47:45 +0530 IST","CreatedSince":"11 days ago","Digest":"\u003cnone\u003e","ID":"bf9e5b7213b4","Repository":"ubuntu","SharedSize":"N/A","Size":"69.2MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"69.19MB"}
{"Containers":"N/A","CreatedAt":"2023-09-20 21:44:10 +0530 IST","CreatedSince":"2 weeks ago","Digest":"\u003cnone\u003e","ID":"2a4fbb36e966","Repository":"nginx","SharedSize":"N/A","Size":"192MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"192.1MB"}
{"Containers":"N/A","CreatedAt":"2023-09-20 15:16:35 +0530 IST","CreatedSince":"2 weeks ago","Digest":"\u003cnone\u003e","ID":"ce6083df2933","Repository":"httpd","SharedSize":"N/A","Size":"195MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"194.8MB"}

We can see that the fields of interest for our use case are Size, Repository, and Tag. All the remaining fields add noise to our use case, but they could be very relevant to some other use cases.

Now, let’s filter the output to show only the necessary fields, such that image size shows up as the first column in the output:

$ docker image list --format "{{.Size}} {{.Repository}}:{{.Tag}}"
4.36GB customlargeimage:latest
7.66MB alpine:latest
69.2MB ubuntu:latest
192MB nginx:latest
195MB httpd:latest

Great! We’ve built a good understanding of the template fields to use in our scenario of sorting the images by their sizes.

4.2. Sorting With Tabular Output

Since we’ve decided to show only two fields in the output, there isn’t much confusion about columns. Nevertheless, we can make it more readable by using the table function:

$ docker image list --format "table {{.Size}}\t{{.Repository}}:{{.Tag}}"
SIZE      REPOSITORY:TAG
4.36GB    customlargeimage:latest
7.66MB    alpine:latest
69.2MB    ubuntu:latest
192MB     nginx:latest
195MB     httpd:latest

Further, let’s pipe this output to the sort -h command for sorting:

$ docker image list --format "table {{.Size}}\t{{.Repository}}:{{.Tag}}" | sort -h
SIZE      REPOSITORY:TAG
7.66MB    alpine:latest
69.2MB    ubuntu:latest
192MB     nginx:latest
195MB     httpd:latest
4.36GB    customlargeimage:latest

Perfect! The tabular report shows the images sorted by their sizes.

5. Using jq

Using the jq command to filter and format the JSON output produced by Docker commands is quite common. Let’s see how to use jq with the docker images command to sort the images by size.

5.1. Add and Remove Fields

Let’s recall that we can use the –format option with the docker images command to show JSON output:

$ docker images --format '{{json .}}'
{"Containers":"N/A","CreatedAt":"2023-09-29 02:09:34 +0530 IST","CreatedSince":"7 days ago","Digest":"\u003cnone\u003e","ID":"3cc203321400","Repository":"alpine","SharedSize":"N/A","Size":"7.66MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"7.66MB"}
{"Containers":"N/A","CreatedAt":"2023-09-25 15:47:45 +0530 IST","CreatedSince":"10 days ago","Digest":"\u003cnone\u003e","ID":"bf9e5b7213b4","Repository":"ubuntu","SharedSize":"N/A","Size":"69.2MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"69.19MB"}
{"Containers":"N/A","CreatedAt":"2023-09-20 21:44:10 +0530 IST","CreatedSince":"2 weeks ago","Digest":"\u003cnone\u003e","ID":"2a4fbb36e966","Repository":"nginx","SharedSize":"N/A","Size":"192MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"192.1MB"}
{"Containers":"N/A","CreatedAt":"2023-09-20 15:16:35 +0530 IST","CreatedSince":"2 weeks ago","Digest":"\u003cnone\u003e","ID":"ce6083df2933","Repository":"httpd","SharedSize":"N/A","Size":"195MB","Tag":"latest","UniqueSize":"N/A","VirtualSize":"194.8MB"}

It’s important to note that each line in the output is a valid JSON object. However, the entire output isn’t a valid JSON array.

To solve this issue, we can use the -s option with jq to stream the output as a JSON array. Additionally, we can use the .[] operator to iterate over each member in the array and keep limited fields, such as Repository and Size, in the output:

$ docker images --format '{{json .}}' | jq -sc '.[] | {Repository, Size}'
{"Repository":"customlargeimage","Size":"4.36GB"}
{"Repository":"alpine","Size":"7.66MB"}
{"Repository":"ubuntu","Size":"69.2MB"}
{"Repository":"nginx","Size":"192MB"}
{"Repository":"httpd","Size":"195MB"}

We used the -c option to show a compact version of the output.

Now, since the size is in human-readable format, we’ll need to convert the value of the Size field to the same unit for direct comparison. So, let’s preserve it as an additional field, originalSize, by using the + operator within the map function:

$ docker image list --format "{{json . }}" | jq -sc '.[] | {Repository, Size}' \
| jq -s 'map(. + {"originalSize": .Size})'
[
  {
    "Repository": "customlargeimage",
    "Size": "4.36GB",
    "originalSize": "4.36GB"
  },
  {
    "Repository": "alpine",
    "Size": "7.66MB",
    "originalSize": "7.66MB"
  },
  {
    "Repository": "ubuntu",
    "Size": "69.2MB",
    "originalSize": "69.2MB"
  },
  {
    "Repository": "nginx",
    "Size": "192MB",
    "originalSize": "192MB"
  },
  {
    "Repository": "httpd",
    "Size": "195MB",
    "originalSize": "195MB"
  }
]

Great! We’ve got the necessary JSON output from the docker images command to further use for our sorting use case.

5.2. Mapping Sizes to Common Unit

We can now map the Size field for each JSON object to a numerical value such that all values for each member are in bytes. Further, we’ll restrict our mapping to practical sizes in GB, MB, KB, and B units.

Let’s see the mapping in action where we use the map(), endswith(), and gsub()  functions, along with ifelse constructs:

$ docker image list --format "{{json . }}" | jq -sc '.[] | {Repository, Size}' \
| jq -s 'map(. + {"originalSize": .Size})' \
| jq -r '
    map(
        if .Size | endswith("GB") then
            .Size |= (gsub("GB"; "") | tonumber * 1024 * 1024 * 1024)
        else
            if .Size | endswith("MB") then
                .Size |= (gsub("MB"; "") | tonumber * 1024 * 1024)
            else
                if .Size | endswith("KB") then
                    .Size |= (gsub("KB"; "") | tonumber * 1024)
                else
                    .size | tonumber
                end
            end
        end
    )'
[
  {
    "Repository": "customlargeimage",
    "Size": 4681514352.64,
    "originalSize": "4.36GB"
  },
  {
    "Repository": "alpine",
    "Size": 8032092.16,
    "originalSize": "7.66MB"
  },
  {
    "Repository": "ubuntu",
    "Size": 72561459.2,
    "originalSize": "69.2MB"
  },
  {
    "Repository": "nginx",
    "Size": 201326592,
    "originalSize": "192MB"
  },
  {
    "Repository": "httpd",
    "Size": 204472320,
    "originalSize": "195MB"
  }
]

The output is as expected, wherein Size is now a numerical field, while originalSize holds the original value in human-readable format.

5.3. Sorting in Action

We can now pipe the output generated so far to another jq invocation that uses the sort_by() function to sort the members by the Size field:

$ docker image list --format "{{json . }}" | jq -sc '.[] | {Repository, Size}' \
| jq -s 'map(. + {"originalSize": .Size})' \
| jq -r '
    map(
        if .Size | endswith("GB") then
            .Size |= (gsub("GB"; "") | tonumber * 1024 * 1024*1024)
        else
            if .Size | endswith("MB") then
                .Size |= (gsub("MB"; "") | tonumber * 1024*1024)
            else
                if .Size | endswith("KB") then
                    .Size |= (gsub("KB"; "") | tonumber * 1024)
                else
                .size | tonumber
                end
            end
        end
    )' \
| jq 'sort_by(.Size)' \
| jq -r 'map([.Repository, .originalSize] | @tsv)[]'
alpine    7.66MB
ubuntu    69.2MB
nginx    192MB
httpd    195MB
customlargeimage    4.36GB

Fantastic! It looks like we’ve nailed this one.

Additionally, we must note that we passed the final output to one more call to jq for formatting the output in a tabular format using the @tsv command.

6. Conclusion

In this article, we learned how to sort the Docker images using the docker images command. Furthermore, we explored multiple command-line utilities, such as sort, awk, cut, and jq, while solving our use case.

Lastly, we also developed a good understanding of the –format option available with the docker images command.

Comments are closed on this article!