DNS Latency: The Invisible App Killer

DNS resolution is often a silent bottleneck, and aggressively caching and prefetching DNS records can shave off hundreds of milliseconds from your application’s latency.

Let’s see this in action. Imagine a web server processing requests. For each new domain it needs to resolve, it first consults its local cache. If the record is there and valid, it’s an instant hit. If not, it queries upstream DNS servers. We can dramatically improve this by tuning the cache’s Time-To-Live (TTL) values and implementing prefetching.

Consider a typical scenario: a web application serving user requests. When a user navigates to a page that loads resources from multiple external domains (e.g., CDNs, ad networks, analytics services), each of those domains needs to be resolved via DNS.

# Simulating a DNS lookup without caching
dig google.com

# Output might look like this (simplified):
# ;; QUESTION SECTION:
# ;google.com.                    IN      A
#
# ;; ANSWER SECTION:
# google.com.             300     IN      A       142.250.190.142
#
# ;; Query time: 45 msec

Now, if we resolve google.com again immediately:

# Simulating a DNS lookup with caching (assuming previous record is cached)
dig google.com

# Output might look like this (simplified):
# ;; QUESTION SECTION:
# ;google.com.                    IN      A
#
# ;; ANSWER SECTION:
# google.com.             300     IN      A       142.250.190.142
#
# ;; Query time: 0 msec  <-- Notice the difference!

The "Query time" drops to near zero because the resolver (like systemd-resolved or dnsmasq) found the answer in its local cache. The 300 in the answer section is the TTL (Time-To-Live) in seconds, indicating how long the record is considered valid by resolvers.

The core problem this solves is the inherent latency of distributed systems. Every network request, including DNS lookups, adds to the total time a user waits for your application. DNS is a hierarchical, distributed database, and querying it involves round trips across the network. By keeping frequently accessed DNS records closer to the application, we reduce these round trips.

Internally, DNS resolvers maintain a cache. When a client (your application or the OS resolver) requests a record, the resolver first checks its cache. If a matching record exists and its TTL hasn’t expired, the resolver returns the cached record immediately. If not, it forwards the query to an upstream DNS server (like your ISP’s DNS or a public resolver like 8.8.8.8).

The key levers you control are:

Resolver Cache Configuration: This involves setting parameters for the DNS resolver software running on your server.
- Cache Size: How many records can be stored?
- Cache TTL: While upstream servers dictate the maximum TTL for a record, you can influence how long your resolver keeps it. Some resolvers allow setting a minimum cache TTL or overriding TTLs.
- Negative Caching: How long to cache the fact that a domain doesn’t exist.
Application-Level Caching: Many applications (web servers, databases, custom clients) have their own internal DNS caches. Tuning these is crucial.
Prefetching: This proactive approach involves resolving domain names before they are actually needed.

Let’s look at configuring systemd-resolved, a common DNS resolver on modern Linux systems.

Improving systemd-resolved Caching:

systemd-resolved caches DNS records. Its configuration is typically found in /etc/systemd/resolved.conf.

[Resolve]
DNS=8.8.8.8 1.1.1.1
#FallbackDNS=
#Domains=
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#DNSOverTLS=no
#Cache=yes  # Default is yes, ensures caching is enabled
#DNSStubListener=yes
#ReadEtcHosts=yes

The Cache=yes directive ensures caching is enabled. systemd-resolved has a default cache size of 1000 entries and a default cache TTL of 1 hour (3600 seconds) for positive responses and 15 seconds for negative responses. You can’t directly configure these specific numbers in resolved.conf, as they are hardcoded. However, if you need more control, you might consider dnsmasq or unbound.

Using dnsmasq for Finer Cache Control:

dnsmasq is a lightweight DNS forwarder and DHCP server often used for local caching. Its configuration is in /etc/dnsmasq.conf.

# Increase the cache size (default is 150)
cache-size=1000

# Set a minimum TTL for cached records (e.g., 5 minutes)
# This forces records to stay in cache longer than their advertised TTL,
# up to this limit.
min-cache-ttl=300

# Set a maximum TTL for cached records (e.g., 1 day)
# This prevents very long TTLs from sticking around too long.
max-cache-ttl=86400

# Use specific upstream DNS servers
server=8.8.8.8
server=1.1.1.1

After modifying /etc/dnsmasq.conf, restart the service: sudo systemctl restart dnsmasq. The min-cache-ttl is particularly powerful, ensuring that even records with very short advertised TTLs are kept in cache for at least min-cache-ttl seconds, preventing frequent lookups.

Prefetching DNS Records:

Prefetching involves identifying domains that will likely be needed soon and performing DNS lookups for them in advance.

1. Application-Level Prefetching: Many web servers and application frameworks offer features to prefetch DNS. For example, Nginx can use the resolver directive to pre-populate its DNS cache.

http {
    # Configure Nginx to use a specific DNS resolver and cache its responses
    resolver 8.8.8.8 1.1.1.1 valid=30s; # Resolve using Google/Cloudflare, cache for 30 seconds

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://backend.example.com;
            # Nginx will now resolve backend.example.com using the configured resolver
            # and cache the result based on the 'valid' time.
        }
    }
}

The valid=30s tells Nginx to cache the DNS resolution for 30 seconds. This is a form of application-level caching and prefetching.

2. System-Level Prefetching (using systemd-resolved): systemd-resolved does not have a direct "prefetch" command in its configuration. However, its robust caching mechanism, combined with potentially aggressive TTL settings on upstream servers or by using a caching resolver like dnsmasq or unbound before systemd-resolved, effectively acts as a prefetcher for commonly accessed domains within the system.

3. Custom Prefetching Scripts: You can write custom scripts that periodically resolve a list of critical domains and flush them into the system’s DNS cache. For instance, using dig and ensuring your system’s resolver is configured to pick up these entries.

#!/bin/bash
DOMAINS_TO_PREFETCH=(
    "cdn.example.com"
    "api.internal.net"
    "analytics.thirdparty.org"
)

for domain in "${DOMAINS_TO_PREFETCH[@]}"; do
    # This command will trigger a lookup and populate the system's resolver cache
    dig +short "$domain" > /dev/null
    echo "Prefetched DNS for: $domain"
done

You would then schedule this script to run periodically using cron or systemd timers.

A subtle point about DNS caching: aggressive caching, especially with min-cache-ttl, can lead to serving stale DNS records if the authoritative DNS server for a domain changes its records and propagates those changes faster than your cache’s TTL. This is a trade-off between performance and immediate propagation of DNS updates. If you frequently change the IP addresses associated with your domains, you might need to reduce min-cache-ttl or max-cache-ttl or implement mechanisms to invalidate the cache upon record changes.

The next step after optimizing DNS resolution is often understanding and tuning TCP connection establishment, particularly the TCP handshake and its associated latency.