CoreDNS is failing to resolve internal Kubernetes service names, resulting in NXDOMAIN errors for pods trying to reach other services.

Common Causes and Fixes for CoreDNS NXDOMAIN Errors

  1. CoreDNS Pods Not Running or Crashing:

    • Diagnosis: Check the status of CoreDNS pods in the kube-system namespace.
      kubectl get pods -n kube-system -l k8s-app=kube-dns
      
      Look for pods in CrashLoopBackOff, Error, or Terminating states. If there are no pods, it means the deployment failed.
    • Fix: If pods are crashing, check their logs for specific errors.
      kubectl logs <coredns-pod-name> -n kube-system
      
      Common reasons for crashing include insufficient resources (CPU/memory) or misconfiguration in the Corefile. Increase resource requests/limits in the CoreDNS deployment manifest if necessary.
      # Example snippet from CoreDNS deployment manifest
      resources:
        requests:
          cpu: "100m"
          memory: "70Mi"
        limits:
          cpu: "200m"
          memory: "140Mi"
      
      This provides the necessary headroom for CoreDNS to operate, preventing out-of-memory or CPU starvation issues that lead to crashes.
    • Why it works: CoreDNS needs stable, running instances to perform DNS lookups. When pods fail, the DNS service becomes unavailable.
  2. Incorrect Corefile Configuration:

    • Diagnosis: Examine the Corefile ConfigMap used by CoreDNS.
      kubectl get configmap coredns -n kube-system -o yaml
      
      Look for syntax errors, incorrect zone definitions, or missing essential plugins. A common mistake is an incorrect . (root zone) configuration or a missing kubernetes plugin.
    • Fix: Edit the Corefile ConfigMap and correct any errors.
      kubectl edit configmap coredns -n kube-system
      
      Ensure it looks similar to this, with the kubernetes plugin correctly configured for the cluster domain (usually cluster.local):
      apiVersion: v1
      data:
        Corefile: |
          .:53 {
              errors
              health {
                 lameduck 5s
              }
              ready
              kubernetes cluster.local in-addr.arpa ip6.arpa {
                 pods insecure
                 fallthrough in-addr.arpa ip6.arpa
              }
              prometheus :9153
              forward . /etc/resolv.conf {
                 max_concurrent 1000
              }
              cache 30
              loop
              reload
              loadbalance
          }
      kind: ConfigMap
      metadata:
        name: coredns
        namespace: kube-system
      # ... other metadata
      
      The kubernetes plugin is crucial; it tells CoreDNS how to resolve *.svc.cluster.local and *.pod.cluster.local names by querying the Kubernetes API. The forward directive handles external lookups.
    • Why it works: The Corefile is the configuration brain of CoreDNS. Correctly defining the kubernetes plugin ensures it knows how to query the cluster’s DNS records.
  3. Service Discovery Issues (API Server Unreachable):

    • Diagnosis: If the kubernetes plugin in the Corefile is misconfigured or CoreDNS cannot reach the Kubernetes API server, it won’t be able to discover services. Check CoreDNS logs for errors related to kubernetes plugin or API server connectivity.
      kubectl logs <coredns-pod-name> -n kube-system | grep "kubernetes.*failed"
      
    • Fix: Ensure CoreDNS pods have network connectivity to the Kubernetes API server. This usually involves checking network policies, CNI configuration, or firewall rules. If CoreDNS is running in a different network namespace or has restrictive network policies, it might be blocked.
      # Example: Check if CoreDNS pod can curl the API server
      kubectl exec -n kube-system <coredns-pod-name> -- curl -k https://kubernetes.default.svc.cluster.local
      
      If this fails, investigate network policies in kube-system or upstream network configurations.
    • Why it works: The kubernetes plugin relies on watching API server endpoints and services. If it can’t reach the API server, it can’t populate its internal cache of cluster services.
  4. Incorrect resolv.conf in Pods:

    • Diagnosis: Pods are configured to use CoreDNS as their DNS server via their resolv.conf file. Check the resolv.conf of a pod that’s experiencing NXDOMAIN errors.
      kubectl exec <pod-name> -- cat /etc/resolv.conf
      
      The nameserver entry should point to the ClusterIP of the CoreDNS service.
    • Fix: The resolv.conf is typically managed by the kubelet. Ensure kubelet is configured correctly to provide DNS to pods. If it’s incorrect, restarting kubelet or checking its configuration (/var/lib/kubelet/config.yaml or command-line flags) might be needed. The ClusterIP for the kube-dns service should be correct.
      kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'
      
      This IP should be listed as the nameserver in /etc/resolv.conf.
    • Why it works: Pods use the resolv.conf to know which DNS server to query. If this file points to the wrong IP or an unreachable server, DNS resolution will fail.
  5. CoreDNS Service Not Available or Incorrect ClusterIP:

    • Diagnosis: Verify that the kube-dns service exists and has the correct ClusterIP.
      kubectl get svc kube-dns -n kube-system
      
      The output should show a CLUSTER-IP and PORT(S) like 53/UDP,53/TCP.
    • Fix: If the service is missing or has an incorrect ClusterIP, it needs to be recreated or fixed. This is often tied to the Kubernetes control plane’s DNS configuration. Ensure the kube-dns service definition is present in the Kubernetes control plane’s manifest or is being managed correctly.
      # Example: If kube-dns service is missing, apply its default definition
      kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/kube-dns/kube-dns-svc.yaml
      
      This ensures a stable endpoint for pods to query.
    • Why it works: The kube-dns service acts as the stable, internal IP address that all pods are configured to use for DNS resolution. If this service is broken, pods can’t find CoreDNS.
  6. Network Policies Blocking DNS Traffic:

    • Diagnosis: If network policies are in place, they might be preventing pods from reaching the CoreDNS service on port 53 (UDP/TCP). Check network policy definitions in the namespace where the client pod resides and in kube-system.
      kubectl get networkpolicy -n <client-pod-namespace>
      kubectl get networkpolicy -n kube-system
      
      Look for policies that might deny egress traffic to kube-system or specifically to the kube-dns service’s ClusterIP.
    • Fix: Add an allow rule to the relevant network policy that permits egress traffic from pods to the kube-dns service (or all pods in kube-system) on port 53.
      # Example: Allow egress to kube-dns service
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-dns
        namespace: <client-pod-namespace>
      spec:
        podSelector: {} # Applies to all pods in the namespace
        policyTypes:
        - Egress
        egress:
        - to:
          - podSelector:
              matchLabels:
                k8s-app: kube-dns # Label for CoreDNS pods
          ports:
          - protocol: UDP
            port: 53
          - protocol: TCP
            port: 53
      
      This explicitly permits DNS traffic, bypassing any implicit deny rules.
    • Why it works: Network policies enforce traffic segmentation. Without a specific allow rule, traffic to CoreDNS might be blocked by default, preventing resolution.

The next error you’ll likely encounter after fixing NXDOMAIN issues is a CrashLoopBackOff on CoreDNS pods if the underlying resource constraints haven’t been addressed.

Want structured learning?

Take the full Coredns course →