Fixing Error “It seems like the kubelet isn’t running or healthy”

1. Overview

When setting up a Kubernetes cluster, we sometimes encounter the error, “It seems like the kubelet isn’t running or healthy.” This message appears during the kubeadm init process and stops the cluster from fully initializing.

Essentially, kubeadm can’t confirm that the kubelet, a critical component on each node, is working correctly. Therefore, the initialization process times out, leaving us with an incomplete cluster. This is a common roadblock, but we can fix it.

In this tutorial, we’ll explore the common causes of this error and how to resolve it effectively.

2. Understanding kubelet and the Error Message

The kubelet is like the on-the-ground manager for each node in our Kubernetes cluster. It’s responsible for keeping everything running smoothly, from managing the lifecycle of pods and containers to monitoring resource usage and reporting back to the control plane.

The kubelet ensures the deployed containers run and remain healthy on the nodes. It communicates directly with the API server, receiving instructions and reporting the status of the node and its workloads.

Consequently, a healthy kubelet is absolutely essential for a functioning cluster. Without it, our deployments won’t work as expected.

Now, let’s look at what the “kubelet isn’t running or healthy” error actually means. When we run kubeadm init, it performs a series of important pre-flight checks. One key check involves verifying that the kubelet on the control plane node is running and healthy.

Specifically, kubeadm tries to reach the kubelet‘s health endpoint, a simple HTTP endpoint that tells us the kubelet‘s health status. If the kubelet isn’t running or if it’s running but unhealthy (for example, due to misconfigurations), this health check fails.

This failure triggers the “kubelet isn’t running or healthy” error, preventing kubeadm from proceeding with cluster initialization.

3. Common Causes of the Error

There are several potential reasons why we may encounter this error. Let’s outline some common causes that can disrupt our kubelet‘s healthy operation.

3.1. kubelet Not Running

Sometimes, our kubelet process fails to start. We might experience this when required packages or dependencies are missing or our service file has errors. Let’s check the status of kubelet:

$ sudo systemctl status kubelet
  kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
                10-kubeadm.conf
     Active: inactive (dead)
       Docs: http://kubernetes.io/docs/

If the process isn’t running like shown above, we know that the kubelet failed to start, directly affecting cluster health.

3.2. Misconfigured cgroup or System Settings

Sometimes, our Docker and kubelet settings may conflict. We encounter issues when Docker uses one cgroup driver while our kubelet expects another. Let’s check Docker’s cgroup driver using docker info:

$ docker info | grep -i cgroup
Cgroup Driver: systemd
Cgroup Version: 2
 cgroupns

Then, we need to align the settings in our kubelet configuration. A misconfiguration of these settings prevents the kubelet from managing resources properly.

3.3. Network and API Server Communication Issues

We sometimes face network misconfigurations that block health checks. Firewall rules or missing port allowances can prevent the kubelet from reaching the API server. Let’s verify connectivity by testing the health endpoint:

$ curl -sSL http://localhost:10248/healthz

If this command fails, we need to review our firewall settings and verify that the required ports (such as 6443 for the API server) are open. In cloud or virtualized environments, security group rules or host firewall configurations might also interfere.

3.4. Authentication and Configuration Mismatches

We sometimes face authentication issues when our kubeconfig file contains incorrect server URLs, outdated tokens, or mismatched contexts. Therefore, we need to ensure that our kubeconfig points to the correct API server endpoint (usually on port 6443) and that our tokens are valid.

Misconfigured credentials prevent the kubelet from authenticating with the API server, which can also result in an error.

4. Step-By-Step Troubleshooting Guide

Let’s now walk through the process of diagnosing and resolving the common kubelet issues that disrupt our cluster initialization.

4.1. Verifying kubelet Status

First, let’s check whether the kubelet runs as expected:

$ sudo systemctl status kubelet
   kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
               10-kubeadm.conf
     Active: active (running) since Fri 2025-02-14 12:04:18 UTC; 4 days ago
       Docs: http://kubernetes.io/docs/
   Main PID: 5671 (kubelet)
      Tasks: 17 (limit: 1404)
     Memory: 39.7M
        CPU: 3h 26min 7.604s
     CGroup: /system.slice/kubelet.service
...

This command prints the current status, showing if the kubelet is active or has encountered errors. Next, let’s use journalctl to examine detailed logs:

$ sudo journalctl -xeu kubelet

Reviewing these logs enables us to quickly identify error messages such as “connection refused” or cgroup mismatches. These logs help us pinpoint whether the kubelet fails at startup or crashes shortly thereafter.

4.2. Inspecting Container Runtime Health

Now, let’s verify that our container runtime operates correctly. We can list all running containers to ensure Kubernetes components are running:

$ docker ps -a | grep kube
0bc850d0f8f5   ba04bb24b957                 "/storage-provisioner"   4 days ago   Up 4 days                         k8s_storage-provisioner_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_2
11a1af66f4b3   2f6c962e7b83                 "/coredns -conf /etc\u2026"   4 days ago   Up 4 days                         k8s_coredns_coredns-668d6bf9bc-f8nxh_kube-system_43fd7dcb-32fa-4eb6-9878-9edacb3baf58_1
a42f00408748   ba04bb24b957                 "/storage-provisioner"   4 days ago   Exited (1) 4 days ago             k8s_storage-provisioner_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_1
1cde9ee777c1   2f50386e20bf                 "/usr/local/bin/kube\u2026"   4 days ago   Up 4 days                         k8s_kube-proxy_kube-proxy-h5hlr_kube-system_476b8492-69e0-4dba-979f-86770a5fa00c_1
c76ba99a9b3e   registry.k8s.io/pause:3.10   "/pause"                 4 days ago   Up 4 days                         k8s_POD_coredns-668d6bf9bc-f8nxh_kube-system_43fd7dcb-32fa-4eb6-9878-9edacb3baf58_1
a52e13d23495   registry.k8s.io/pause:3.10   "/pause"                 4 days ago   Up 4 days                         k8s_POD_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_1
...

This command lists all Kubernetes-related containers. Next, let’s check individual container logs to spot issues. For example, we can inspect the logs of the kube-apiserver container with:

$ docker logs <kube-apiserver-container-id>

Let’s repeat this process for other critical containers like etcd or the kube-controller-manager. These logs reveal if a container crashed, a control plane component failed to start, or the images were outdated.

The runtime’s health is crucial because any container failure may cause the kubelet to report that it isn’t healthy.

4.3. Resolving cgroup and Configuration Issues

Often, we face misconfigurations between Docker and the kubelet, especially regarding cgroup drivers. First, let’s check Docker’s cgroup driver:

$ docker info | grep -i cgroup

Next, let’s align our kubelet settings to match Docker’s driver. We can open the kubelet configuration file — commonly located in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf — and update the –cgroup-driver flag. For instance, if Docker uses cgroupfs, we modify the file to include:

Environment="KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs"

After saving the changes, we reload systemd‘s configuration files and restart the kubelet:

$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet

Then, let’s recheck the kubelet status to confirm the cgroup drivers align properly. This alignment ensures that our resource management remains consistent across the system.

4.4. Disabling Swap and Adjusting System Settings

Furthermore, let’s address any system-level settings that could impact the kubelet. Swap must be turned off for proper kubelet operation. To disable swap temporarily:

$ sudo swapoff -a

Then, let’s open the /etc/fstab file to comment out any swap entries, ensuring swap remains disabled after a reboot:

$ sudo vim /etc/fstab

Let’s add a comment (#) before any line referencing swap:

# UUID=xxxx-xxxx none swap sw 0 0

After editing the file, we need to reboot the system to apply the changes. Moreover, we need to verify that our network settings allow the kubelet to communicate with the control plane. For example, let’s check that firewall rules permit access to port 10248:

$ sudo ufw status

If necessary, we can add rules to open the required ports. These steps help prevent networking issues that might block the kubelet from functioning correctly.

4.5. Re-Running kubeadm Commands

Finally, let’s reset and reinitialize the cluster using kubeadm. We can reset the cluster state with:

$ sudo kubeadm reset

Then, we reinitialize the cluster:

$ sudo kubeadm init

If we need to join worker nodes again, we can run the join command provided by the kubeadm init output.

Hence, by re-running these commands, we ensure that all components synchronize correctly and that the kubelet can register with the control plane without conflicts.

5. Conclusion

In this article, we explored the most common causes behind the “It seems like the kubelet isn’t running or healthy” error. We saw how several issues, including misconfigured cgroup drivers and network problems, can disrupt the kubelet‘s operation and lead to this frustrating error.

Additionally, we learned to address these faults head-on by verifying configurations, inspecting logs, and adjusting settings. By following the troubleshooting guide, we can confidently diagnose the root cause of the error and restore the functionality of our Kubernetes cluster.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Full Archive

About Baeldung