CoreDNS failed to resolve external hostnames because it couldn’t reach the upstream DNS servers it was configured to use.

This isn’t just a "network problem"; it’s a failure in the intricate dance between your Kubernetes cluster and the outside world. CoreDNS, the cluster’s internal DNS resolver, is supposed to act as a gateway, forwarding requests it can’t handle internally to external DNS servers. When that gateway jams, nothing outside the cluster is discoverable by name.

Here’s how to diagnose and fix it, starting with the most common culprits:

1. NetworkPolicy Blocking Egress Traffic

  • Diagnosis: Check if any NetworkPolicy resources are restricting egress traffic from the kube-system namespace (where CoreDNS typically runs) to the internet or specific IP ranges.
    kubectl get networkpolicy -n kube-system
    
    Look for policies that might have egress rules with empty to fields or ipBlock entries that don’t include your upstream DNS server IPs (e.g., 8.8.8.8/32, 1.1.1.1/32).
  • Fix: If a restrictive NetworkPolicy is found, either modify it to allow egress to your upstream DNS servers on UDP/TCP port 53, or create a new policy that explicitly permits this traffic.
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-coredns-egress
      namespace: kube-system
    spec:
      podSelector:
        matchLabels:
          k8s-app: kube-dns # This label often targets CoreDNS pods
      policyTypes:
      - Egress
      egress:
      - to:
        - ipBlock:
            cidr: 8.8.8.8/32 # Example: Google Public DNS
        ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
      # Add other necessary egress rules for other upstream DNS servers or services
    
  • Why it works: NetworkPolicy is Kubernetes’ way of enforcing network segmentation. By default, pods can communicate freely. This policy explicitly grants permission for CoreDNS pods to send DNS queries (UDP/TCP 53) to external IP addresses, bypassing any overly restrictive default egress rules.

2. Incorrect Upstream DNS Configuration in CoreDNS ConfigMap

  • Diagnosis: CoreDNS uses a ConfigMap (often named coredns) in the kube-system namespace to define its behavior. Examine this ConfigMap for the forward directive.
    kubectl get configmap coredns -n kube-system -o yaml
    
    Look for a section like this within the data.Corefile:
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        # This is the crucial part:
        forward . 8.8.8.8 1.1.1.1 {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
    
    Ensure the IP addresses listed in the forward directive are correct and reachable from your cluster nodes.
  • Fix: Edit the coredns ConfigMap to correct the IP addresses in the forward directive.
    kubectl edit configmap coredns -n kube-system
    
    Update the forward line with the correct upstream DNS server IPs (e.g., your internal DNS servers, or public ones like 8.8.8.8 and 1.1.1.1).
  • Why it works: The forward plugin tells CoreDNS where to send DNS queries it can’t resolve locally (like google.com). If these upstream IPs are wrong or unreachable, CoreDNS will fail to get answers for external hostnames. Correcting them ensures CoreDNS has valid destinations for its forwarded queries.

3. CoreDNS Pods Not Running or Crashing

  • Diagnosis: Check the status of the CoreDNS pods.
    kubectl get pods -n kube-system -l k8s-app=kube-dns
    
    If pods are in CrashLoopBackOff, Error, or Terminating states, investigate their logs.
    kubectl logs <coredns-pod-name> -n kube-system
    
    Look for errors indicating configuration parsing issues, failing health checks, or resource exhaustion.
  • Fix:
    • Resource Limits: If logs show OOMKilled or similar resource issues, increase the CPU and memory requests/limits for the CoreDNS pods in their Deployment.
      kubectl edit deployment coredns -n kube-system
      
      Adjust resources.requests and resources.limits accordingly.
    • Configuration Errors: If logs show parsing errors in the Corefile, fix the ConfigMap (as in point 2).
    • Node Issues: If pods are stuck Pending, check node resources or taints/tolerations.
  • Why it works: CoreDNS must be running and healthy to perform its DNS resolution duties. Addressing resource constraints or configuration errors allows the CoreDNS pods to start and operate correctly.

4. Node-Level DNS Configuration Issues (iptables/kube-proxy)

  • Diagnosis: Kubernetes uses kube-proxy and iptables (or IPVS) to route DNS traffic. Sometimes, node-level network configurations can interfere. Check iptables rules on the node where the CoreDNS pod is running.
    # SSH into the node and run:
    sudo iptables-save | grep 53
    
    Look for rules that might be dropping or misdirecting UDP/TCP traffic on port 53, especially to the node’s IP address or the cluster’s service IP for DNS. Also, check if kube-proxy is running correctly on the node.
    sudo systemctl status kube-proxy
    
  • Fix: This is often the trickiest. It might involve:
    • Restarting kube-proxy on the affected node: sudo systemctl restart kube-proxy.
    • Manually flushing iptables rules (use with extreme caution): sudo iptables -F.
    • Re-provisioning the node if the iptables state is severely corrupted.
    • Ensuring kube-proxy is configured to use the correct network mode (e.g., iptables or ipvs).
  • Why it works: kube-proxy manages the cluster’s network rules, including those for DNS. If these rules are broken, traffic intended for CoreDNS might be dropped or sent to the wrong place, even if CoreDNS itself is healthy. Restoring or correcting these rules re-establishes the correct network paths.

5. Firewall Rules on Nodes or Network Infrastructure

  • Diagnosis: Even if NetworkPolicy is permissive, host firewalls (firewalld, ufw, iptables directly on the node) or external network firewalls (e.g., cloud provider security groups, corporate firewalls) might be blocking egress from your Kubernetes nodes to the upstream DNS servers on UDP/TCP port 53.
    # On a node, try to directly ping or telnet to an upstream DNS server
    # (This might be blocked by a firewall too, but can indicate general connectivity)
    telnet 8.8.8.8 53
    
    Check cloud provider security group rules, AWS NACLs, Azure NSGs, or your on-premises firewall configurations.
  • Fix: Update firewall rules on the nodes or network infrastructure to allow egress traffic from your Kubernetes nodes (specifically their node IPs) to your chosen upstream DNS servers on UDP and TCP port 53.
  • Why it works: This is a fundamental network connectivity issue. If the network path is blocked at the infrastructure level, CoreDNS’s requests simply won’t reach their destination, regardless of Kubernetes-internal configurations.

6. DNS Server Unavailability or Misconfiguration

  • Diagnosis: The upstream DNS servers themselves might be down, overloaded, or misconfigured.
    # From a node, try to resolve a hostname directly using the upstream server:
    dig @8.8.8.8 google.com
    
    If this fails consistently, the problem is with the upstream server, not CoreDNS.
  • Fix: Switch to different, known-good upstream DNS servers in the CoreDNS ConfigMap or troubleshoot the upstream DNS infrastructure.
  • Why it works: If the target you’re forwarding requests to is broken, your forwarding service can’t succeed. This isolates the problem to the external DNS infrastructure.

After fixing these, you’ll likely encounter the next common issue: pods being unable to reach services within the cluster due to kube-proxy or CNI misconfigurations.

Want structured learning?

Take the full Coredns course →