CoreDNS, when deployed for high availability on Kubernetes, doesn’t inherently round-robin between instances; it relies on a sophisticated health check and endpoint selection mechanism managed by Kubernetes itself.
Let’s see this in action. Imagine a Kubernetes Service object targeting your CoreDNS pods. This Service is what your applications will use to resolve DNS queries.
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
spec:
selector:
k8s-app: kube-dns # This selects your CoreDNS pods
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
When a pod in your cluster needs to resolve kubernetes.default.svc.cluster.local, it sends a DNS query to the kube-dns Service IP. Kubernetes then directs this query to one of the healthy CoreDNS pods backing that Service. The magic isn’t in CoreDNS deciding which of its peers to talk to; it’s Kubernetes’ Service abstraction ensuring that your query lands on an available and responsive CoreDNS instance.
For high availability, you’d typically deploy multiple CoreDNS replicas. A common pattern is to use a Deployment or StatefulSet to manage these replicas.
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
spec:
replicas: 3 # At least 2 for HA, 3 is a good starting point
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
containers:
- name: coredns
image: coredns/coredns:1.10.1 # Use a recent, stable version
args:
- "-conf=/etc/coredns/Corefile"
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: 8080 # Default health port if enabled in Corefile
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080 # Default health port if enabled in Corefile
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
The livenessProbe and readinessProbe are critical. The livenessProbe tells Kubernetes if the CoreDNS container is still running and responsive. If it fails, Kubernetes will restart the pod. The readinessProbe tells Kubernetes if the CoreDNS pod is ready to accept traffic. If it fails, Kubernetes will stop sending traffic to that pod via the kube-dns Service. This combination ensures that traffic is only directed to healthy, ready instances.
The Corefile itself is where you configure CoreDNS’s behavior. For high availability, you’re primarily focused on ensuring your cluster DNS is handled correctly and that any upstream resolvers are configured robustly.
.:53 {
errors
health {
lameduck 5s # Give it 5 seconds to shut down gracefully
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
The health and ready plugins within the Corefile expose endpoints (/health and /ready by default on port 8080) that the Kubernetes probes can check. The loadbalance plugin is not for distributing incoming requests from your application pods; it’s for distributing outgoing requests from CoreDNS itself when it needs to forward queries to upstream resolvers. This is a common point of confusion.
When a CoreDNS pod receives a query for an external domain (e.g., google.com), and it’s configured to forward to upstream resolvers, the loadbalance plugin within that specific CoreDNS instance will round-robin between the upstream servers listed in /etc/resolv.conf (or wherever you’ve defined them). It does not influence how Kubernetes directs traffic to your CoreDNS pods.
The prometheus plugin exposes metrics on port 9153, which you can scrape for monitoring. The cache plugin helps reduce load on upstream resolvers by caching DNS records. reload allows CoreDNS to pick up changes to its Corefile without restarting.
The most surprising aspect of CoreDNS HA on Kubernetes is how little CoreDNS itself is involved in the "high availability" aspect of serving your cluster’s DNS queries. Kubernetes handles the distribution and health checking of the CoreDNS pods. CoreDNS’s internal loadbalance plugin is for its own outbound forwarding, not for inbound request distribution.
If you want to see how Kubernetes directs traffic, you can use kubectl exec into a pod and run dig against the kube-dns service IP. You’ll see it resolves consistently. Then, if you were to kill one CoreDNS pod, dig would continue to work, and you’d see the traffic switch to the remaining healthy pods by observing changes in CoreDNS’s request logs or Prometheus metrics.
The next challenge you’ll likely encounter is optimizing CoreDNS performance and understanding its advanced configuration options, particularly around caching and upstream forwarding strategies.