CoreDNS is acting as a transparent proxy for DNS queries, and you’re seeing SERVFAIL responses for some or all of your clients.
The core issue is that CoreDNS, when configured for DNS-over-TLS (DoT), is failing to establish a secure connection to its upstream DoT resolver, or it’s receiving an invalid response from that resolver. This can happen for a variety of reasons, from network misconfiguration to incorrect certificate validation.
Here’s a breakdown of the common causes and how to fix them:
1. Upstream DoT Server Unreachable
The most frequent culprit is that CoreDNS simply can’t reach the upstream DoT server on its designated port (usually 853).
Diagnosis:
From the server running CoreDNS, try a direct dig command to the upstream DoT server using the +tls flag. Use a known public DoT server like cloudflare-dns.com (IP 1.1.1.1).
dig @1.1.1.1 -p 853 google.com A +tls
If this fails with a timeout or connection refused, the problem is network-level.
Fix: Ensure that port 853 is open in your firewall for outbound connections from your CoreDNS server to the upstream DoT server. If you’re using a cloud provider, check their security group or network ACL rules.
# Example using iptables (adjust interface and destination as needed)
sudo iptables -A OUTPUT -p tcp --dport 853 -d 1.1.1.1 -j ACCEPT
sudo iptables -A OUTPUT -p udp --dport 853 -d 1.1.1.1 -j ACCEPT
This allows TCP and UDP traffic on port 853 to the specified destination, which is what DoT uses.
2. Incorrect Upstream DoT Server Name in Corefile
CoreDNS needs to know which DoT server to connect to. If the hostname in your Corefile is misspelled or points to a non-existent/incorrect IP, it won’t work.
Diagnosis:
Examine your Corefile for the forward directive within your DoT zone.
.:53 {
forward . 1.1.1.1 {
tls 1.1.1.1
}
# ... other plugins
}
In this example, 1.1.1.1 is used both as the IP to forward to and as the TLS host verification name. If 1.1.1.1 were replaced with dns.example.com, the tls option should reflect that.
Fix:
Ensure the IP address or hostname specified in the forward directive matches the actual DoT server you intend to use. If you’re using a hostname (e.g., cloudflare-dns.com), ensure it’s correctly spelled and that CoreDNS can resolve it to the correct IP.
# If using a hostname for the DoT server
.:53 {
forward . cloudflare-dns.com {
tls cloudflare-dns.com
}
# ...
}
This tells CoreDNS to forward queries to cloudflare-dns.com and to use cloudflare-dns.com for TLS certificate validation.
3. TLS Certificate Validation Failure
CoreDNS, by default, will validate the TLS certificate presented by the upstream DoT server against your system’s trusted CA store. If the certificate is expired, self-signed, or issued by an untrusted CA, validation will fail.
Diagnosis:
Check the CoreDNS logs for messages related to certificate verification errors. You might see entries like x509: certificate signed by unknown authority or x509: certificate has expired.
Fix:
- For public DoT servers: This usually indicates an issue with your system’s CA bundle. Ensure your system is up-to-date and has the latest CA certificates. On many Linux systems, this means updating the
ca-certificatespackage.# Debian/Ubuntu sudo apt update && sudo apt upgrade ca-certificates # RHEL/CentOS/Fedora sudo yum update ca-certificates - For private DoT servers: If you’re using your own DoT server or a private CA, you need to tell CoreDNS to trust that CA. You can do this by providing the CA certificate to CoreDNS.
In your
Corefile:
Place your CA’s public certificate (e.g.,.:53 { forward . my.internal.dns { tls my.internal.dns tls_ca /etc/coredns/my-ca.pem } # ... }my-ca.pem) on the CoreDNS server and ensure CoreDNS can read it. This explicitly trusts the certificate chain.
4. Upstream DoT Server Not Running or Misconfigured
The upstream DoT server itself might not be running, or it might be misconfigured to not accept DoT connections.
Diagnosis:
Try connecting to the upstream DoT server using openssl s_client to test its TLS handshake directly.
openssl s_client -connect 1.1.1.1:853 -servername cloudflare-dns.com
If this handshake fails, the problem is with the upstream server, not CoreDNS.
Fix:
Consult the documentation for your upstream DoT server (e.g., unbound, dnscrypt-proxy, or another CoreDNS instance) to ensure it’s configured to listen on port 853 with TLS enabled and that its certificates are correctly set up. Restart the upstream DoT service after making changes.
5. CoreDNS TLS Configuration Issues
While less common, there might be subtle issues with how CoreDNS itself is configured for TLS.
Diagnosis:
Review the tls directive within your Corefile. Ensure the hostname specified for tls is correct and matches the certificate presented by the upstream server.
Fix:
If you’re using an IP address for the forward directive but a hostname for tls, ensure that hostname is resolvable and that the upstream server’s certificate is valid for that hostname.
# Example: Forwarding to an IP, but validating against a hostname
.:53 {
forward . 1.1.1.1 {
tls cloudflare-dns.com # This hostname must match the cert
}
# ...
}
If the upstream server provides a certificate valid for 1.1.1.1, you might use that directly. However, most DoT servers use hostnames.
6. MTU Issues or Network Path Problems
Sometimes, DNS over TLS can be sensitive to network path issues, particularly with Maximum Transmission Unit (MTU) sizes. Large TLS packets might get fragmented and lost.
Diagnosis:
This is harder to diagnose directly. You might see intermittent failures or failures only with larger DNS responses. A ping with a large packet size to the upstream server can sometimes reveal MTU issues, though this doesn’t directly test DoT.
# Test ping with a large payload (e.g., 1400 bytes)
ping -s 1400 -M do 1.1.1.1
Fix: If MTU issues are suspected, you might need to adjust the MTU on your CoreDNS server’s network interface or on intermediate network devices. This is a more advanced network troubleshooting step. For common cloud environments, ensuring "Jumbo Frames" are supported and configured correctly if applicable can help.
After resolving these issues, your next likely problem will be a SERVFAIL from a different upstream DoT resolver if you have multiple configured, or perhaps your clients will start seeing REFUSED if your CoreDNS server isn’t properly listening on UDP/TCP port 53.