1. Introduction

Kubernetes is a distributed container orchestration framework. To utilize that, we can write various definitions for resources such as pods and storage. After that, we apply them to a cluster of nodes, at which point Kubernetes decides where to run and get storage for what workload. However, it’s sometimes hard to see the final distribution, especially of storage, for a given node.

In this tutorial, we explore steps to see Kubernetes storage usage for a particular cluster node. First, we briefly refresh our knowledge about Kubernetes deployments and their storage needs. After that, we perform a general storage usage check on the node of interest. Next, we check container images and their sizes. Then, we turn to the container runtime and its storage needs. Finally, we deal with pod storage usage discovery.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. Unless otherwise specified, it should work in most POSIX-compliant environments.

2. Kubernetes Storage

When it comes to storage used by Kubernetes, we should consider a number of sources:

While storage within Kubernetes is mainly distributed as PV or PVC resources, we can see that other parts of the whole framework can also take up space.

Let’s check the current Kubernetes cluster to have an overview of its state and elements:

$ kubectl get all --all-namespaces
NAMESPACE              NAME                                             READY   STATUS    RESTARTS   AGE
default                pod/pod0                                         1/1     Running   1          1h
kube-system            pod/coredns-5dd5756b68-69h86                     1/1     Running   0          24h
kube-system            pod/etcd-xost                                    1/1     Running   1          24h
kube-system            pod/kube-apiserver-xost                          1/1     Running   0          24h
kube-system            pod/kube-controller-manager-xost                 1/1     Running   0          24h
kube-system            pod/kube-proxy-55ht8                             1/1     Running   0          24h
kube-system            pod/kube-scheduler-xost                          1/1     Running   6          24h
kube-system            pod/storage-provisioner                          1/1     Running   0          24h
kubernetes-dashboard   pod/dashboard-metrics-scraper-7fd5cb4ddc-7sqqq   1/1     Running   0          24h
kubernetes-dashboard   pod/kubernetes-dashboard-8694d4445c-hnt9w        1/1     Running   0          24h

NAMESPACE              NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default                service/kubernetes                  ClusterIP   10.96.0.1        <none>        443/TCP                  24h
kube-system            service/kube-dns                    ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   24h
kubernetes-dashboard   service/dashboard-metrics-scraper   ClusterIP   10.106.175.119   <none>        8000/TCP                 24h
kubernetes-dashboard   service/kubernetes-dashboard        ClusterIP   10.105.26.231    <none>        80/TCP                   24h

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/kube-proxy   1         1         1       1            1           kubernetes.io/os=linux   24h

NAMESPACE              NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
kube-system            deployment.apps/coredns                     1/1     1            1           24h
kubernetes-dashboard   deployment.apps/dashboard-metrics-scraper   1/1     1            1           24h
kubernetes-dashboard   deployment.apps/kubernetes-dashboard        1/1     1            1           24h

NAMESPACE              NAME                                                   DESIRED   CURRENT   READY   AGE
kube-system            replicaset.apps/coredns-5dd5756b68                     1         1         1       24h
kubernetes-dashboard   replicaset.apps/dashboard-metrics-scraper-7fd5cb4ddc   1         1         1       24h
kubernetes-dashboard   replicaset.apps/kubernetes-dashboard-8694d4445c        1         1         1       24h

Apart from the dashboard and pod0, this is a fairly empty deployment.

On a lower level, the container runtime might use other means such as an overlay filesystem to expose storage:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
[...]
overlay          49G   25G   22G  54% /var/lib/docker/overlay2/349a603c4759f3f92c0e506f7668ebd06661d8a3b56fcbd1f06b071f49c81883/merged
overlay          49G   25G   22G  54% /var/lib/docker/overlay2/a4fdbc126518168bad7f2977f16eb666f7f33f47b67d9d75d2f74b96c3474eb9/merged
shm              64M     0   64M   0% /var/lib/docker/containers/55b6adb8afcd66659f32d1899993c2311f7d110fb430ce4ff8b9714f81c536a4/mounts/shm
shm              64M     0   64M   0% 
[...]
overlay          49G   25G   22G  54% /var/lib/docker/overlay2/733c46aa655aee3b736cafe66629e32e0d9f4156671608243117b95a32c716a9/merged

Because of such abstractions, it can sometimes be hard to establish the exact storage resources that are currently available on a node and how much of them are in use.

So, let’s take a step-by-step methodical approach to discover what part of the Kubernetes ecosystem takes up the most storage.

3. General Storage Usage Check

As with any other reason for storage allocation issues, we begin by checking the overall usage.

To do this, we first install ncdu:

$ apt install ncdu

After that, we run it for the / filesystem root on the Kubernetes node of interest:

$ ncdu /

Thus, the tool performs a scan. This may take considerable time, depending on the storage medium speed, size, and current load.

After that, we get results in the form of a navigable list, sorted by size in descending order:

ncdu 1.18 ~ Use the arrow keys to navigate, press ? for help
--- / ---------------------------------------------------------------
.  66.6 GiB [#############################] /var
    6.0 GiB [######                       ] /usr
    1.6 GiB [#                            ] /root
    1.0 GiB [                             ] /home
  385.1 MiB [                             ] /opt
   98.5 MiB [                             ] /boot
   15.3 MiB [                             ] /etc
    5.6 MiB [                             ] /run
    2.1 MiB [                             ] /mnt
    1.0 MiB [                             ] /dev
   57.0 KiB [                             ] /tmp
e  16.0 KiB [                             ] /lost+found
    8.0 KiB [                             ] /media
    8.0 KiB [                             ] /srv
    8.0 KiB [                             ] /nfs
e   4.0 KiB [                             ] /test
.   0.0   B [                             ] /proc
    0.0   B [                             ] /sys
 Total disk usage:  78.7 GiB  Apparent size: 128.1 TiB  Items: 666016

Here, we can use several keys to navigate:

  • Up Arrow and Down Arrow: move focus respectively
  • Right Arrow or Return: enter directory in focus
  • Left Arrow or Backspace: go to upper directory

For instance, if we go to /var/lib/, we can see the minikube installation directory, which takes up around 300MB mainly due to its binaries.

Since dockerd is the container runtime for this particular Kubernetes deployment, we can also check /var/lib/docker/:

ncdu 1.18 ~ Use the arrow keys to navigate, press ? for help
--- /var/lib/docker -------------------------------------------------
                                            /..
    6.6 GiB [#############################] /overlay2
  666.0 MiB [###                          ] /volumes
    7.0 MiB [                             ] /image
    3.3 MiB [                             ] /containers
    1.6 MiB [                             ] /buildkit
   96.0 KiB [                             ] /network
   16.0 KiB [                             ] /plugins
    8.0 KiB [                             ] /tmp
e   4.0 KiB [                             ] /trust
e   4.0 KiB [                             ] /swarm
e   4.0 KiB [                             ] /runtimes
    4.0 KiB [                             ]  nuke-graph-directory.sh
    4.0 KiB [                             ]  engine-id

 Total disk usage:   7.2 GiB  Apparent size:   4.4 GiB  Items: 66609

As expected, we can see the overlay2 and volumes directories are taking up the most space, since that’s how Docker organizes storage. Yet, we can’t be sure which exact containers are the main culprits by just looking at the raw filesystem.

For example, Kubernetes might represent only a part of the Docker usage.

4. Container Runtime Images

Kubernetes orchestrates containers. Images are one of the relatively hidden storage costs of containers. Although often stripped and minimalistic, they still take up space.

Let’s see how to get container image sizes with Docker:

$ docker image list --all
REPOSITORY                                TAG        IMAGE ID       CREATED         SIZE
minapi-minapi                             latest     9b8f05946672   6 days ago      88.2MB
debian                                    latest     c9786667d5fe   3 weeks ago     117MB
debian                                    bullseye   52d643040b9a   4 weeks ago     124MB
python                                    latest     ae66048b7429   8 weeks ago     1.02GB
[...]

Even though they are base images, we can already see the SIZE column can amount to gigabytes.

Notably, Kubernetes images usually have the k8s or kubernetes string in their name:

registry.k8s.io/kube-apiserver            v1.28.3    537434729123   5 months ago    126MB
registry.k8s.io/kube-controller-manager   v1.28.3    10baa1ca1706   5 months ago    122MB
registry.k8s.io/kube-scheduler            v1.28.3    6d1b4fd1b182   5 months ago    60.1MB
registry.k8s.io/kube-proxy                v1.28.3    bfc896cf80fb   5 months ago    73.1MB
registry.k8s.io/etcd                      3.5.9-0    73deb9a3f702   10 months ago   294MB
registry.k8s.io/coredns/coredns           v1.10.1    ead0a4a53df8   13 months ago   53.6MB
registry.k8s.io/pause                     3.9        e6f181688397   17 months ago   744kB
kubernetesui/dashboard                    <none>     07655ddf2eeb   18 months ago   246MB
kubernetesui/metrics-scraper              <none>     115053965e86   22 months ago   43.8MB
gcr.io/k8s-minikube/storage-provisioner   v5         6e38f40d628d   2 years ago     31.5MB

This way, we can distinguish most pod-related containers from those isolated in Docker.

5. Container Runtime Storage Usage

Since containers usually make up a big part of the storage usage for a Kubernetes deployment, we list the mapping between containers and directories.

When it comes to Docker, we can use one compound command to map containers to directories:

$ docker inspect --format=$'{{.Name}}\n >>> {{.GraphDriver.Data.MergedDir}}\n' $(docker ps --all --quiet)

Let’s break this command down:

  • inspect shows selected (container name and filesystem path) and [–format]ted data about a container
  • ps lists –all containers [–quiet]ly (only container identifiers)
  • $() is a command substitution that gets interpolated

For instance, there are already some familiar mappings from the df overlay listing we saw earlier:

[...]
/k8s_POD_kube-apiserver-xost_kube-system_b11cd851d3b912861b5862cb512d0521_0
 >>> /var/lib/docker/overlay2/a4fdbc126518168bad7f2977f16eb666f7f33f47b67d9d75d2f74b96c3474eb9/merged
/k8s_kubernetes-dashboard_kubernetes-dashboard-8694d4445c-hnt9w_kubernetes-dashboard_cb1dc601-ecfe-42d3-b590-ca79877ae036_0
 >>> /var/lib/docker/overlay2/733c46aa655aee3b736cafe66629e32e0d9f4156671608243117b95a32c716a9/merged
[...]

Thus, we can focus on specific containers, especially those with the k8s_ prefix.

6. Kubernetes Pods Storage Usage

Going higher up the chain, we can discover which Kubernetes pod is associated with a given (large) directory:

$ kubectl get pods --all-namespaces --output=jsonpath='
  {range .items[*]}{@.metadata.name}
  {" >>> volumes: "}{@.spec.volumes}
  {" >>> volumeMounts: "}{@..volumeMounts}
  {"\n"}{end}'

This kubectl command uses the get subcommand to extract the name and volumes data for all pods in –all-namespaces.

To do so, it uses a special jsonpath that goes through several steps:

  1. {range .items[*]}: iterate through all items (pods)
  2. {@.metadata.name}: item name
  3. {“\n >>> volumes: “}: visual formatting
  4. {@.spec.volumes}: the item volumes [spec]ifications
  5. {“\n >>> volumeMounts: “}: visual formatting
  6. {@..volumeMounts}: the item volumeMounts
  7. {“\n\n””}: visual formatting
  8. {end}: terminate jsonpath

Thus, we acquire all volume information, including directories.

To get further information about a given PV or PVC, we can use the describe subcommand and the respective resource NAME:

$ kubectl describe [pv|pvc] <NAME> --all-namespaces

Of course, we can also get a more script-friendly version via get:

$ kubectl get [pv|pvc] <NAME> --all-namespaces --output=json

This way, the Storage Class and Access Mode can give us an idea of any quotas or storage limits.

7. Summary

In this article, we talked about Kubernetes node storage allocation and discovery.

In conclusion, the storage usage around a Kubernetes deployment can vary significantly, so knowing how to analyze and limit it can be critical.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments