CoreDNS is failing to resolve internal Kubernetes service names, resulting in NXDOMAIN errors for pods trying to reach other services.
Common Causes and Fixes for CoreDNS NXDOMAIN Errors
-
CoreDNS Pods Not Running or Crashing:
- Diagnosis: Check the status of CoreDNS pods in the
kube-systemnamespace.
Look for pods inkubectl get pods -n kube-system -l k8s-app=kube-dnsCrashLoopBackOff,Error, orTerminatingstates. If there are no pods, it means the deployment failed. - Fix: If pods are crashing, check their logs for specific errors.
Common reasons for crashing include insufficient resources (CPU/memory) or misconfiguration in thekubectl logs <coredns-pod-name> -n kube-systemCorefile. Increase resource requests/limits in the CoreDNS deployment manifest if necessary.
This provides the necessary headroom for CoreDNS to operate, preventing out-of-memory or CPU starvation issues that lead to crashes.# Example snippet from CoreDNS deployment manifest resources: requests: cpu: "100m" memory: "70Mi" limits: cpu: "200m" memory: "140Mi" - Why it works: CoreDNS needs stable, running instances to perform DNS lookups. When pods fail, the DNS service becomes unavailable.
- Diagnosis: Check the status of CoreDNS pods in the
-
Incorrect
CorefileConfiguration:- Diagnosis: Examine the
CorefileConfigMap used by CoreDNS.
Look for syntax errors, incorrect zone definitions, or missing essential plugins. A common mistake is an incorrectkubectl get configmap coredns -n kube-system -o yaml.(root zone) configuration or a missingkubernetesplugin. - Fix: Edit the
CorefileConfigMap and correct any errors.
Ensure it looks similar to this, with thekubectl edit configmap coredns -n kube-systemkubernetesplugin correctly configured for the cluster domain (usuallycluster.local):
TheapiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 loop reload loadbalance } kind: ConfigMap metadata: name: coredns namespace: kube-system # ... other metadatakubernetesplugin is crucial; it tells CoreDNS how to resolve*.svc.cluster.localand*.pod.cluster.localnames by querying the Kubernetes API. Theforwarddirective handles external lookups. - Why it works: The
Corefileis the configuration brain of CoreDNS. Correctly defining thekubernetesplugin ensures it knows how to query the cluster’s DNS records.
- Diagnosis: Examine the
-
Service Discovery Issues (API Server Unreachable):
- Diagnosis: If the
kubernetesplugin in theCorefileis misconfigured or CoreDNS cannot reach the Kubernetes API server, it won’t be able to discover services. Check CoreDNS logs for errors related tokubernetesplugin or API server connectivity.kubectl logs <coredns-pod-name> -n kube-system | grep "kubernetes.*failed" - Fix: Ensure CoreDNS pods have network connectivity to the Kubernetes API server. This usually involves checking network policies, CNI configuration, or firewall rules. If CoreDNS is running in a different network namespace or has restrictive network policies, it might be blocked.
If this fails, investigate network policies in# Example: Check if CoreDNS pod can curl the API server kubectl exec -n kube-system <coredns-pod-name> -- curl -k https://kubernetes.default.svc.cluster.localkube-systemor upstream network configurations. - Why it works: The
kubernetesplugin relies on watching API server endpoints and services. If it can’t reach the API server, it can’t populate its internal cache of cluster services.
- Diagnosis: If the
-
Incorrect
resolv.confin Pods:- Diagnosis: Pods are configured to use CoreDNS as their DNS server via their
resolv.conffile. Check theresolv.confof a pod that’s experiencingNXDOMAINerrors.
Thekubectl exec <pod-name> -- cat /etc/resolv.confnameserverentry should point to the ClusterIP of the CoreDNS service. - Fix: The
resolv.confis typically managed by the kubelet. Ensure kubelet is configured correctly to provide DNS to pods. If it’s incorrect, restarting kubelet or checking its configuration (/var/lib/kubelet/config.yamlor command-line flags) might be needed. The ClusterIP for thekube-dnsservice should be correct.
This IP should be listed as thekubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'nameserverin/etc/resolv.conf. - Why it works: Pods use the
resolv.confto know which DNS server to query. If this file points to the wrong IP or an unreachable server, DNS resolution will fail.
- Diagnosis: Pods are configured to use CoreDNS as their DNS server via their
-
CoreDNS Service Not Available or Incorrect ClusterIP:
- Diagnosis: Verify that the
kube-dnsservice exists and has the correct ClusterIP.
The output should show akubectl get svc kube-dns -n kube-systemCLUSTER-IPandPORT(S)like53/UDP,53/TCP. - Fix: If the service is missing or has an incorrect ClusterIP, it needs to be recreated or fixed. This is often tied to the Kubernetes control plane’s DNS configuration. Ensure the
kube-dnsservice definition is present in the Kubernetes control plane’s manifest or is being managed correctly.
This ensures a stable endpoint for pods to query.# Example: If kube-dns service is missing, apply its default definition kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/kube-dns/kube-dns-svc.yaml - Why it works: The
kube-dnsservice acts as the stable, internal IP address that all pods are configured to use for DNS resolution. If this service is broken, pods can’t find CoreDNS.
- Diagnosis: Verify that the
-
Network Policies Blocking DNS Traffic:
- Diagnosis: If network policies are in place, they might be preventing pods from reaching the CoreDNS service on port 53 (UDP/TCP). Check network policy definitions in the namespace where the client pod resides and in
kube-system.
Look for policies that might deny egress traffic tokubectl get networkpolicy -n <client-pod-namespace> kubectl get networkpolicy -n kube-systemkube-systemor specifically to thekube-dnsservice’s ClusterIP. - Fix: Add an
allowrule to the relevant network policy that permits egress traffic from pods to thekube-dnsservice (or all pods inkube-system) on port 53.
This explicitly permits DNS traffic, bypassing any implicit deny rules.# Example: Allow egress to kube-dns service apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns namespace: <client-pod-namespace> spec: podSelector: {} # Applies to all pods in the namespace policyTypes: - Egress egress: - to: - podSelector: matchLabels: k8s-app: kube-dns # Label for CoreDNS pods ports: - protocol: UDP port: 53 - protocol: TCP port: 53 - Why it works: Network policies enforce traffic segmentation. Without a specific allow rule, traffic to CoreDNS might be blocked by default, preventing resolution.
- Diagnosis: If network policies are in place, they might be preventing pods from reaching the CoreDNS service on port 53 (UDP/TCP). Check network policy definitions in the namespace where the client pod resides and in
The next error you’ll likely encounter after fixing NXDOMAIN issues is a CrashLoopBackOff on CoreDNS pods if the underlying resource constraints haven’t been addressed.