Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: April 10, 2025
When setting up a Kubernetes cluster, we sometimes encounter the error, “It seems like the kubelet isn’t running or healthy.” This message appears during the kubeadm init process and stops the cluster from fully initializing.
Essentially, kubeadm can’t confirm that the kubelet, a critical component on each node, is working correctly. Therefore, the initialization process times out, leaving us with an incomplete cluster. This is a common roadblock, but we can fix it.
In this tutorial, we’ll explore the common causes of this error and how to resolve it effectively.
The kubelet is like the on-the-ground manager for each node in our Kubernetes cluster. It’s responsible for keeping everything running smoothly, from managing the lifecycle of pods and containers to monitoring resource usage and reporting back to the control plane.
The kubelet ensures the deployed containers run and remain healthy on the nodes. It communicates directly with the API server, receiving instructions and reporting the status of the node and its workloads.
Consequently, a healthy kubelet is absolutely essential for a functioning cluster. Without it, our deployments won’t work as expected.
Now, let’s look at what the “kubelet isn’t running or healthy” error actually means. When we run kubeadm init, it performs a series of important pre-flight checks. One key check involves verifying that the kubelet on the control plane node is running and healthy.
Specifically, kubeadm tries to reach the kubelet‘s health endpoint, a simple HTTP endpoint that tells us the kubelet‘s health status. If the kubelet isn’t running or if it’s running but unhealthy (for example, due to misconfigurations), this health check fails.
This failure triggers the “kubelet isn’t running or healthy” error, preventing kubeadm from proceeding with cluster initialization.
There are several potential reasons why we may encounter this error. Let’s outline some common causes that can disrupt our kubelet‘s healthy operation.
Sometimes, our kubelet process fails to start. We might experience this when required packages or dependencies are missing or our service file has errors. Let’s check the status of kubelet:
$ sudo systemctl status kubelet
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; disabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
10-kubeadm.conf
Active: inactive (dead)
Docs: http://kubernetes.io/docs/
If the process isn’t running like shown above, we know that the kubelet failed to start, directly affecting cluster health.
Sometimes, our Docker and kubelet settings may conflict. We encounter issues when Docker uses one cgroup driver while our kubelet expects another. Let’s check Docker’s cgroup driver using docker info:
$ docker info | grep -i cgroup
Cgroup Driver: systemd
Cgroup Version: 2
cgroupns
Then, we need to align the settings in our kubelet configuration. A misconfiguration of these settings prevents the kubelet from managing resources properly.
We sometimes face network misconfigurations that block health checks. Firewall rules or missing port allowances can prevent the kubelet from reaching the API server. Let’s verify connectivity by testing the health endpoint:
$ curl -sSL http://localhost:10248/healthz
If this command fails, we need to review our firewall settings and verify that the required ports (such as 6443 for the API server) are open. In cloud or virtualized environments, security group rules or host firewall configurations might also interfere.
We sometimes face authentication issues when our kubeconfig file contains incorrect server URLs, outdated tokens, or mismatched contexts. Therefore, we need to ensure that our kubeconfig points to the correct API server endpoint (usually on port 6443) and that our tokens are valid.
Misconfigured credentials prevent the kubelet from authenticating with the API server, which can also result in an error.
Let’s now walk through the process of diagnosing and resolving the common kubelet issues that disrupt our cluster initialization.
First, let’s check whether the kubelet runs as expected:
$ sudo systemctl status kubelet
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; disabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
10-kubeadm.conf
Active: active (running) since Fri 2025-02-14 12:04:18 UTC; 4 days ago
Docs: http://kubernetes.io/docs/
Main PID: 5671 (kubelet)
Tasks: 17 (limit: 1404)
Memory: 39.7M
CPU: 3h 26min 7.604s
CGroup: /system.slice/kubelet.service
...
This command prints the current status, showing if the kubelet is active or has encountered errors. Next, let’s use journalctl to examine detailed logs:
$ sudo journalctl -xeu kubelet
Reviewing these logs enables us to quickly identify error messages such as “connection refused” or cgroup mismatches. These logs help us pinpoint whether the kubelet fails at startup or crashes shortly thereafter.
Now, let’s verify that our container runtime operates correctly. We can list all running containers to ensure Kubernetes components are running:
$ docker ps -a | grep kube
0bc850d0f8f5 ba04bb24b957 "/storage-provisioner" 4 days ago Up 4 days k8s_storage-provisioner_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_2
11a1af66f4b3 2f6c962e7b83 "/coredns -conf /etc\u2026" 4 days ago Up 4 days k8s_coredns_coredns-668d6bf9bc-f8nxh_kube-system_43fd7dcb-32fa-4eb6-9878-9edacb3baf58_1
a42f00408748 ba04bb24b957 "/storage-provisioner" 4 days ago Exited (1) 4 days ago k8s_storage-provisioner_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_1
1cde9ee777c1 2f50386e20bf "/usr/local/bin/kube\u2026" 4 days ago Up 4 days k8s_kube-proxy_kube-proxy-h5hlr_kube-system_476b8492-69e0-4dba-979f-86770a5fa00c_1
c76ba99a9b3e registry.k8s.io/pause:3.10 "/pause" 4 days ago Up 4 days k8s_POD_coredns-668d6bf9bc-f8nxh_kube-system_43fd7dcb-32fa-4eb6-9878-9edacb3baf58_1
a52e13d23495 registry.k8s.io/pause:3.10 "/pause" 4 days ago Up 4 days k8s_POD_storage-provisioner_kube-system_cdf46079-517d-4e90-8333-1b3240ac98a5_1
...
This command lists all Kubernetes-related containers. Next, let’s check individual container logs to spot issues. For example, we can inspect the logs of the kube-apiserver container with:
$ docker logs <kube-apiserver-container-id>
Let’s repeat this process for other critical containers like etcd or the kube-controller-manager. These logs reveal if a container crashed, a control plane component failed to start, or the images were outdated.
The runtime’s health is crucial because any container failure may cause the kubelet to report that it isn’t healthy.
Often, we face misconfigurations between Docker and the kubelet, especially regarding cgroup drivers. First, let’s check Docker’s cgroup driver:
$ docker info | grep -i cgroup
Next, let’s align our kubelet settings to match Docker’s driver. We can open the kubelet configuration file — commonly located in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf — and update the –cgroup-driver flag. For instance, if Docker uses cgroupfs, we modify the file to include:
Environment="KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs"
After saving the changes, we reload systemd‘s configuration files and restart the kubelet:
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet
Then, let’s recheck the kubelet status to confirm the cgroup drivers align properly. This alignment ensures that our resource management remains consistent across the system.
Furthermore, let’s address any system-level settings that could impact the kubelet. Swap must be turned off for proper kubelet operation. To disable swap temporarily:
$ sudo swapoff -a
Then, let’s open the /etc/fstab file to comment out any swap entries, ensuring swap remains disabled after a reboot:
$ sudo vim /etc/fstab
Let’s add a comment (#) before any line referencing swap:
# UUID=xxxx-xxxx none swap sw 0 0
After editing the file, we need to reboot the system to apply the changes. Moreover, we need to verify that our network settings allow the kubelet to communicate with the control plane. For example, let’s check that firewall rules permit access to port 10248:
$ sudo ufw status
If necessary, we can add rules to open the required ports. These steps help prevent networking issues that might block the kubelet from functioning correctly.
Finally, let’s reset and reinitialize the cluster using kubeadm. We can reset the cluster state with:
$ sudo kubeadm reset
Then, we reinitialize the cluster:
$ sudo kubeadm init
If we need to join worker nodes again, we can run the join command provided by the kubeadm init output.
Hence, by re-running these commands, we ensure that all components synchronize correctly and that the kubelet can register with the control plane without conflicts.
In this article, we explored the most common causes behind the “It seems like the kubelet isn’t running or healthy” error. We saw how several issues, including misconfigured cgroup drivers and network problems, can disrupt the kubelet‘s operation and lead to this frustrating error.
Additionally, we learned to address these faults head-on by verifying configurations, inspecting logs, and adjusting settings. By following the troubleshooting guide, we can confidently diagnose the root cause of the error and restore the functionality of our Kubernetes cluster.