CoreDNS failed to resolve external hostnames because it couldn’t reach the upstream DNS servers it was configured to use.
This isn’t just a "network problem"; it’s a failure in the intricate dance between your Kubernetes cluster and the outside world. CoreDNS, the cluster’s internal DNS resolver, is supposed to act as a gateway, forwarding requests it can’t handle internally to external DNS servers. When that gateway jams, nothing outside the cluster is discoverable by name.
Here’s how to diagnose and fix it, starting with the most common culprits:
1. NetworkPolicy Blocking Egress Traffic
- Diagnosis: Check if any
NetworkPolicyresources are restricting egress traffic from thekube-systemnamespace (where CoreDNS typically runs) to the internet or specific IP ranges.
Look for policies that might havekubectl get networkpolicy -n kube-systemegressrules with emptytofields oripBlockentries that don’t include your upstream DNS server IPs (e.g.,8.8.8.8/32,1.1.1.1/32). - Fix: If a restrictive
NetworkPolicyis found, either modify it to allow egress to your upstream DNS servers on UDP/TCP port 53, or create a new policy that explicitly permits this traffic.apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-coredns-egress namespace: kube-system spec: podSelector: matchLabels: k8s-app: kube-dns # This label often targets CoreDNS pods policyTypes: - Egress egress: - to: - ipBlock: cidr: 8.8.8.8/32 # Example: Google Public DNS ports: - protocol: UDP port: 53 - protocol: TCP port: 53 # Add other necessary egress rules for other upstream DNS servers or services - Why it works:
NetworkPolicyis Kubernetes’ way of enforcing network segmentation. By default, pods can communicate freely. This policy explicitly grants permission for CoreDNS pods to send DNS queries (UDP/TCP 53) to external IP addresses, bypassing any overly restrictive default egress rules.
2. Incorrect Upstream DNS Configuration in CoreDNS ConfigMap
- Diagnosis: CoreDNS uses a
ConfigMap(often namedcoredns) in thekube-systemnamespace to define its behavior. Examine thisConfigMapfor theforwarddirective.
Look for a section like this within thekubectl get configmap coredns -n kube-system -o yamldata.Corefile:
Ensure the IP addresses listed in the.:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus :9153 # This is the crucial part: forward . 8.8.8.8 1.1.1.1 { max_concurrent 1000 } cache 30 loop reload loadbalance }forwarddirective are correct and reachable from your cluster nodes. - Fix: Edit the
corednsConfigMap to correct the IP addresses in theforwarddirective.
Update thekubectl edit configmap coredns -n kube-systemforwardline with the correct upstream DNS server IPs (e.g., your internal DNS servers, or public ones like8.8.8.8and1.1.1.1). - Why it works: The
forwardplugin tells CoreDNS where to send DNS queries it can’t resolve locally (likegoogle.com). If these upstream IPs are wrong or unreachable, CoreDNS will fail to get answers for external hostnames. Correcting them ensures CoreDNS has valid destinations for its forwarded queries.
3. CoreDNS Pods Not Running or Crashing
- Diagnosis: Check the status of the CoreDNS pods.
If pods are inkubectl get pods -n kube-system -l k8s-app=kube-dnsCrashLoopBackOff,Error, orTerminatingstates, investigate their logs.
Look for errors indicating configuration parsing issues, failing health checks, or resource exhaustion.kubectl logs <coredns-pod-name> -n kube-system - Fix:
- Resource Limits: If logs show OOMKilled or similar resource issues, increase the CPU and memory requests/limits for the CoreDNS pods in their Deployment.
Adjustkubectl edit deployment coredns -n kube-systemresources.requestsandresources.limitsaccordingly. - Configuration Errors: If logs show parsing errors in the
Corefile, fix theConfigMap(as in point 2). - Node Issues: If pods are stuck
Pending, check node resources or taints/tolerations.
- Resource Limits: If logs show OOMKilled or similar resource issues, increase the CPU and memory requests/limits for the CoreDNS pods in their Deployment.
- Why it works: CoreDNS must be running and healthy to perform its DNS resolution duties. Addressing resource constraints or configuration errors allows the CoreDNS pods to start and operate correctly.
4. Node-Level DNS Configuration Issues (iptables/kube-proxy)
- Diagnosis: Kubernetes uses
kube-proxyandiptables(or IPVS) to route DNS traffic. Sometimes, node-level network configurations can interfere. Checkiptablesrules on the node where the CoreDNS pod is running.
Look for rules that might be dropping or misdirecting UDP/TCP traffic on port 53, especially to the node’s IP address or the cluster’s service IP for DNS. Also, check if# SSH into the node and run: sudo iptables-save | grep 53kube-proxyis running correctly on the node.sudo systemctl status kube-proxy - Fix: This is often the trickiest. It might involve:
- Restarting
kube-proxyon the affected node:sudo systemctl restart kube-proxy. - Manually flushing
iptablesrules (use with extreme caution):sudo iptables -F. - Re-provisioning the node if the
iptablesstate is severely corrupted. - Ensuring
kube-proxyis configured to use the correct network mode (e.g.,iptablesoripvs).
- Restarting
- Why it works:
kube-proxymanages the cluster’s network rules, including those for DNS. If these rules are broken, traffic intended for CoreDNS might be dropped or sent to the wrong place, even if CoreDNS itself is healthy. Restoring or correcting these rules re-establishes the correct network paths.
5. Firewall Rules on Nodes or Network Infrastructure
- Diagnosis: Even if
NetworkPolicyis permissive, host firewalls (firewalld,ufw,iptablesdirectly on the node) or external network firewalls (e.g., cloud provider security groups, corporate firewalls) might be blocking egress from your Kubernetes nodes to the upstream DNS servers on UDP/TCP port 53.
Check cloud provider security group rules, AWS NACLs, Azure NSGs, or your on-premises firewall configurations.# On a node, try to directly ping or telnet to an upstream DNS server # (This might be blocked by a firewall too, but can indicate general connectivity) telnet 8.8.8.8 53 - Fix: Update firewall rules on the nodes or network infrastructure to allow egress traffic from your Kubernetes nodes (specifically their node IPs) to your chosen upstream DNS servers on UDP and TCP port 53.
- Why it works: This is a fundamental network connectivity issue. If the network path is blocked at the infrastructure level, CoreDNS’s requests simply won’t reach their destination, regardless of Kubernetes-internal configurations.
6. DNS Server Unavailability or Misconfiguration
- Diagnosis: The upstream DNS servers themselves might be down, overloaded, or misconfigured.
If this fails consistently, the problem is with the upstream server, not CoreDNS.# From a node, try to resolve a hostname directly using the upstream server: dig @8.8.8.8 google.com - Fix: Switch to different, known-good upstream DNS servers in the CoreDNS
ConfigMapor troubleshoot the upstream DNS infrastructure. - Why it works: If the target you’re forwarding requests to is broken, your forwarding service can’t succeed. This isolates the problem to the external DNS infrastructure.
After fixing these, you’ll likely encounter the next common issue: pods being unable to reach services within the cluster due to kube-proxy or CNI misconfigurations.