Pod garbage collection isn’t just about cleaning up deleted pods; it’s a critical mechanism for managing cluster resources and preventing runaway resource consumption.
Let’s see it in action. Imagine you have a deployment that scales up rapidly, creating dozens of pods. When you scale it down, or when pods fail and are replaced, the Kubernetes control plane needs to eventually remove the old, terminating pods from etcd. If this cleanup process stalls, these "ghost" pods can consume valuable etcd storage and clutter API server lists, slowing down operations.
Here’s a typical scenario: a pod goes into a Terminating state. It should, under normal circumstances, be cleaned up by the kube-controller-manager. But what if it gets stuck?
The primary component responsible for garbage collection of pods is the kube-controller-manager. It watches for pods that have been deleted from the API server but still exist in etcd. When it detects these orphaned pods, it initiates their removal.
Common Causes for Stuck Pod Garbage Collection
-
kube-controller-managerNot Running or Crashing:- Diagnosis: Check the status of the
kube-controller-managerpods in thekube-systemnamespace:
If the pods are notkubectl get pods -n kube-system | grep kube-controller-managerRunningor are in aCrashLoopBackOffstate, this is your primary suspect. Examine their logs:kubectl logs <kube-controller-manager-pod-name> -n kube-system - Fix: Address the underlying reason for the
kube-controller-managercrashing. This could be resource starvation (CPU/memory), misconfiguration in its manifest, or issues with its dependencies (like etcd or API server connectivity). Restarting the control plane components or resolving the resource issue will allow it to resume garbage collection. - Why it works: The
kube-controller-manageris the only component that actively cleans up terminated pods from etcd. If it’s not running, no cleanup happens.
- Diagnosis: Check the status of the
-
Etcd Performance Issues:
- Diagnosis: Monitor etcd’s performance. High latency, frequent disconnections, or high CPU/disk I/O on etcd nodes can impede the controller manager’s ability to read from and write to etcd. Check etcd logs for errors related to disk performance or network issues.
# On etcd nodes journalctl -u etcd -f # Or check metrics if you have Prometheus/Grafana set up - Fix: Optimize etcd performance. This might involve ensuring etcd nodes have fast SSDs, adequate CPU and RAM, and a stable network. For persistent issues, consider scaling up etcd resources or tuning etcd’s
--auto-compaction-retentionand–quota-backend-bytessettings. - Why it works: Etcd is the cluster’s brain. If it’s sluggish, the
kube-controller-managercannot efficiently query for and delete terminated pods.
- Diagnosis: Monitor etcd’s performance. High latency, frequent disconnections, or high CPU/disk I/O on etcd nodes can impede the controller manager’s ability to read from and write to etcd. Check etcd logs for errors related to disk performance or network issues.
-
API Server Overload or Unresponsiveness:
- Diagnosis: If the API server is overloaded with requests, the
kube-controller-managermight experience timeouts or delays when trying to interact with it. Check API server logs and metrics for high latency or error rates.# On API server nodes kubectl logs <api-server-pod-name> -n kube-system # Check API server metrics for request latency and error counts - Fix: Scale up the API server replicas or ensure it has sufficient resources. Identify and address the source of excessive API requests if possible.
- Why it works: The
kube-controller-managerrelies on the API server to communicate with etcd and other cluster components. API server unresponsiveness directly impacts the controller manager’s operations.
- Diagnosis: If the API server is overloaded with requests, the
-
Network Partition or Firewall Issues:
- Diagnosis: Verify that the
kube-controller-managerpods can reach the etcd cluster and the API server endpoints. Check network policies, firewall rules, and CNI configurations.# From a kube-controller-manager pod kubectl exec -it <kube-controller-manager-pod-name> -n kube-system -- nc -vz <etcd-host> <etcd-port> kubectl exec -it <kube-controller-manager-pod-name> -n kube-system -- nc -vz <api-server-host> <api-server-port> - Fix: Correct any misconfigured network policies, firewall rules, or CNI settings that are blocking communication between the
kube-controller-manager, API server, and etcd. - Why it works: If the controller manager can’t talk to etcd or the API server due to network issues, it can’t perform its garbage collection duties.
- Diagnosis: Verify that the
-
Stuck Pod Finalizers:
- Diagnosis: A pod stuck in
Terminatingstate with no apparent progress often indicates an issue with its finalizers. A finalizer is a key-value pair that signals that the Kubernetes garbage collector must wait for a specific controller to perform some cleanup before the object can be deleted. If the controller responsible for a finalizer fails or is not implemented correctly, the pod will remain stuck.
Look for thekubectl get pod <stuck-pod-name> -n <namespace> -o yamlmetadata.finalizersfield. If it’s present and the associated controller isn’t acting, that’s the problem. - Fix: Manually remove the finalizer from the pod’s definition. This is typically done by editing the pod object directly:
Then, remove the entry underkubectl edit pod <stuck-pod-name> -n <namespace>metadata.finalizers:. Caution: This bypasses the normal cleanup process associated with the finalizer. Ensure you understand which finalizer is present and what cleanup it was intended to perform. - Why it works: By removing the finalizer, you tell Kubernetes that the external cleanup process is no longer required, allowing the garbage collector to proceed with deleting the pod from etcd.
- Diagnosis: A pod stuck in
-
Resource Quotas or Limit Ranges:
- Diagnosis: While less common for stuck garbage collection, aggressive resource quotas or limit ranges could, in theory, impact the
kube-controller-manager’s ability to operate if it’s running as a pod itself and constrained. Check thekube-controller-managerpod’s resource requests/limits and any relevant quotas in its namespace. - Fix: Adjust resource quotas or limit ranges if they are excessively restrictive for control plane components.
- Why it works: Ensures the
kube-controller-managerhas sufficient resources to perform its background tasks.
- Diagnosis: While less common for stuck garbage collection, aggressive resource quotas or limit ranges could, in theory, impact the
After resolving these issues, you’ll likely encounter the next challenge: ensuring your cluster state is consistent and that no lingering objects (like Terminating namespaces or other stuck resources) remain.