CoreDNS health and readiness probes are how Kubernetes knows if your DNS service is actually working, not just if the process is running.
Here’s a look at a typical CoreDNS configuration in a Kubernetes cluster:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
This Corefile is the heart of CoreDNS. Let’s break down how probes fit into this.
The health plugin actively checks if CoreDNS is healthy. It exposes an HTTP endpoint (by default on port 8080) that Kubernetes probes can query. The lameduck 5s directive means CoreDNS will consider itself unhealthy for 5 seconds after a restart or plugin failure, giving it time to stabilize before Kubernetes marks it as unhealthy. This prevents transient issues from causing unnecessary pod restarts.
The ready plugin, on the other hand, indicates whether CoreDNS is ready to serve traffic. It also exposes an HTTP endpoint (by default on port 8181). Kubernetes uses this to determine if a pod should receive network traffic. If the ready probe fails, Kubernetes stops sending new connections to that pod.
Kubernetes’s built-in kubelet on each node is responsible for executing these probes. For CoreDNS, this typically means two kubelet checks:
- Liveness Probe: This checks the
healthendpoint. If it fails,kubeletrestarts the CoreDNS container. - Readiness Probe: This checks the
readyendpoint. If it fails,kubeletremoves the pod from the Service endpoints.
The kubernetes plugin handles DNS resolution for services within the cluster. It’s configured to look for cluster.local domains and supports reverse lookups for in-addr.arpa and ip6.arpa. The pods insecure option means it won’t verify TLS certificates for pods, and upstream tells it to use upstream DNS servers if it can’t resolve a query internally. fallthrough ensures that if internal lookups fail, the query is passed to the next plugin (in this case, forward).
The prometheus plugin exposes metrics on port 9153, which are invaluable for monitoring the health and performance of CoreDNS.
The forward plugin is configured to forward DNS queries it cannot answer locally to the DNS servers specified in /etc/resolv.conf. The max_concurrent 1000 limits the number of simultaneous forward requests, preventing overload.
cache 30 caches DNS responses for 30 seconds, significantly speeding up subsequent lookups for the same domains.
loop detects and prevents DNS loops.
reload allows CoreDNS to automatically reload its configuration when the Corefile changes, without requiring a pod restart.
loadbalance distributes load across multiple upstream servers if configured.
The health plugin’s endpoint is typically queried via curl http://<coredns-pod-ip>:8080/health. A successful response is usually an empty body with a 200 OK status code. The ready plugin’s endpoint is curl http://<coredns-pod-ip>:8181/ready. Similarly, a successful response is an empty body with a 200 OK.
A common pitfall is not having the health and ready plugins explicitly in the Corefile. If they are omitted, CoreDNS will still run, but Kubernetes won’t have specific endpoints to query for liveness and readiness, potentially leading to misinterpretations of the pod’s state. The default behavior without these plugins is often to rely on the container’s basic network connectivity, which isn’t as granular.
Another point of confusion can be the lameduck setting in the health plugin. If this value is too low, or if CoreDNS experiences very brief network interruptions, Kubernetes might incorrectly perceive the DNS service as unhealthy and restart pods unnecessarily. Conversely, a very high lameduck value could mask legitimate, persistent issues.
The forward plugin’s configuration, particularly the upstream servers it uses, is critical. If the upstream DNS servers are unreachable or misconfigured, CoreDNS will fail to resolve external names, impacting all services that rely on it for external connectivity. You can test this by exec-ing into a CoreDNS pod and using nslookup google.com or dig google.com.
The reload plugin, while convenient, can sometimes cause issues if the Corefile is malformed. CoreDNS will attempt to reload, fail, and log an error, but the pod might not immediately restart unless the health probe also fails. You’d then need to check the CoreDNS logs for the specific parsing error.
The kubernetes plugin’s fallthrough directive is essential for ensuring that queries that aren’t handled by the cluster’s internal DNS logic are still processed. Without it, if CoreDNS couldn’t resolve a cluster-internal name, it might simply drop the query or return an error, rather than passing it to the forward plugin for external resolution.
When troubleshooting, you’ll often find yourself checking /etc/coredns/Corefile within the CoreDNS pod, examining the logs of the CoreDNS pods using kubectl logs -n kube-system <coredns-pod-name>, and using kubectl exec -n kube-system <coredns-pod-name> -- curl <endpoint> to test the probe endpoints directly.
The next logical step after ensuring robust health and readiness probing is to optimize CoreDNS performance, often by tuning its caching behavior and understanding the impact of upstream DNS resolvers.