The DNS resolver is failing to complete queries because it’s unable to negotiate Extended DNS (EDNS) options with upstream servers.
This usually manifests as slow lookups, intermittent resolution failures, or specific error messages like "server failed," "SERVFAIL," or "EDNS Disabled" when using tools like dig. EDNS is crucial for modern DNS, enabling larger packet sizes and other advanced features. When it’s disabled or not properly negotiated, many queries will simply fail.
Cause 1: Firewall Blocking UDP Port 53
- Diagnosis: Run
tcpdumpon your DNS resolver’s network interface, filtering for UDP port 53, and attempt a DNS query from a client. Look for outgoing DNS packets from the resolver that are not being acknowledged or that are being dropped.sudo tcpdump -n -i eth0 udp port 53 - Fix: Configure your firewall (e.g.,
iptables,ufw, or cloud provider security groups) to explicitly allow UDP traffic on port 53 from your DNS resolver to the internet. Foriptables:sudo iptables -A OUTPUT -p udp --dport 53 -j ACCEPT - Why it works: Firewalls, especially stateful ones or those with overly aggressive egress filtering, can sometimes block or drop UDP packets on port 53 if they don’t match expected patterns or if the connection state isn’t properly maintained, preventing EDNS options from being exchanged.
Cause 2: Network Address Translation (NAT) Device Issues
- Diagnosis: If your DNS resolver is behind a NAT device (router, firewall), attempt to perform an EDNS-enabled query from a machine outside your network to your resolver. If that works, but internal clients fail, the NAT device is suspect. Also, check the NAT device’s logs for UDP fragmentation or port exhaustion errors.
- Fix: Ensure the NAT device has "UDP fragmentation" or "IP fragmentation" enabled and is not suffering from port exhaustion. Some older or poorly configured NAT devices struggle with larger UDP packets characteristic of EDNS. Update firmware on the NAT device if available.
- Why it works: EDNS often requires larger UDP packet sizes (e.g., 4096 bytes). Some NAT devices, particularly older ones or those with buggy firmware, can mangle or drop these larger UDP packets, or run out of available ephemeral ports for the NAT mapping, disrupting the EDNS negotiation.
Cause 3: Upstream DNS Server Configuration
- Diagnosis: Manually query an upstream DNS server (e.g.,
8.8.8.8) with EDNS enabled usingdigand specify a large EDNS payload size. If this fails consistently, the upstream server might be misconfigured or blocking EDNS.
Check the response code anddig @8.8.8.8 google.com EDNS0 +dnssec +bufsize=4096flagsin thedigoutput. - Fix: Switch to a different, reliable upstream DNS provider that fully supports EDNS. If you control the upstream servers, ensure their DNS software (e.g., BIND, Unbound) is configured to accept EDNS queries and has adequate buffer sizes. For BIND, ensure
recursion yes;and checkmax-udp-sizeinnamed.conf.options. - Why it works: The upstream DNS server might be old, misconfigured, or intentionally configured to disable EDNS support for security or performance reasons, causing your resolver to fail when it tries to negotiate.
Cause 4: DNS Resolver Software Configuration
- Diagnosis: Review your DNS resolver’s configuration file (e.g.,
/etc/bind/named.conf.optionsfor BIND,/etc/unbound/unbound.conffor Unbound). Look for any explicit settings that might disable EDNS or limit UDP packet sizes. - Fix: Ensure EDNS is not explicitly disabled. For BIND, ensure
max-udp-sizeis set to a reasonable value (e.g.,max-udp-size 4096;). For Unbound, ensureedns-buffer-sizeis set appropriately (e.g.,edns-buffer-size: 1472).- BIND
named.conf.optionsexample:options { directory "/var/cache/bind"; recursion yes; allow-query { any; }; listen-on { any; }; max-udp-size 4096; // Ensure this is set dnssec-enable yes; dnssec-validation yes; }; - Unbound
unbound.confexample:server: interface: 0.0.0.0 access-control: 0.0.0.0/0 allow hide-identity: yes hide-version: yes edns-buffer-size: 1472 # Standard safe value prefetch: yes num-threads: 4 harden-dnssec-stripped: yes
- BIND
- Why it works: The DNS resolver software itself might have a setting limiting the UDP payload size or explicitly disabling EDNS. Setting these to appropriate values allows the resolver to properly advertise and negotiate EDNS capabilities.
Cause 5: DNSSEC Interaction Issues
- Diagnosis: If you have DNSSEC validation enabled on your resolver, try disabling it temporarily and see if resolution improves. If it does, the issue is likely related to how EDNS and DNSSEC are interacting, possibly due to packet size limitations when DNSSEC records are included.
- Fix: Ensure your network path and upstream servers can handle the larger UDP packets required for DNSSEC responses, which often leverage EDNS. If using BIND, ensure
dnssec-enable yes;anddnssec-validation yes;are set, andmax-udp-sizeis sufficiently large (e.g., 4096). For Unbound,harden-dnssec-stripped: yesis important. - Why it works: DNSSEC responses can be significantly larger than standard DNS responses, especially when including RRSIG and DNSKEY records. EDNS is essential for transmitting these larger packets. If network devices or upstream servers cannot handle these larger EDNS-enabled UDP packets, DNSSEC validation will fail, and sometimes this can lead to broader EDNS negotiation failures.
Cause 6: Client-Side Network Issues
- Diagnosis: Test DNS resolution from multiple clients on the network. If only specific clients experience the issue, check their local network configuration, firewalls, or VPN clients. Run
digwith EDNS from an affected client.
Look fordig google.com EDNS0flags: ... ;andOPT PSEUDOSECTION:in the output. - Fix: Ensure client-side firewalls (like Windows Firewall or macOS firewall) are not blocking outbound UDP port 53 traffic or are not interfering with larger packet sizes. If using a VPN, test resolution with the VPN disconnected.
- Why it works: Client-side software, particularly aggressive firewalls or VPN clients, can interfere with network traffic, including DNS queries that use EDNS and require larger UDP payloads.
After fixing these, the next common issue you might encounter is a "SERVFAIL" error specifically related to DNSSEC validation, indicating that while EDNS is working, the security records themselves are causing problems, often due to upstream server issues or network filtering of DNSSEC-specific records.