CoreDNS, when configured as an external plugin, can distribute DNS query load across multiple upstream DNS servers.

Imagine you have a single, powerful DNS server that’s doing all the heavy lifting. As your network grows, that server becomes a bottleneck. Queries slow down, and eventually, it can’t keep up. This is where distributing the load comes in. Instead of one server, you have several, and you want to send traffic to them evenly. CoreDNS, acting as an intermediary, can help with this by intelligently forwarding queries to different upstream servers based on a defined strategy.

Let’s see it in action. Suppose we have two upstream DNS servers, 192.168.1.10 and 192.168.1.11, both authoritative for our internal example.com domain. We want CoreDNS to balance queries between them.

Here’s a snippet from a CoreDNS Corefile that accomplishes this:

.:53 {
    errors
    cache 30
    forward . 192.168.1.10 192.168.1.11 {
       policy round_robin
    }
    prometheus :9153
    hosts {
       fallthrough
    }
    reload
}

In this configuration:

  • .:53: This is the default zone, meaning CoreDNS will listen on port 53 for all domains it’s responsible for.
  • errors: This plugin logs errors encountered during DNS processing.
  • cache 30: This caches DNS responses for 30 seconds, reducing the load on upstream servers for frequently requested records.
  • forward . 192.168.1.10 192.168.1.11 { ... }: This is the core of our load balancing.
    • .: Specifies that this forwarder applies to all domains.
    • 192.168.1.10 192.168.1.11: These are the IP addresses of our upstream DNS servers.
    • policy round_robin: This is the crucial part for load balancing. It tells CoreDNS to send queries to the upstream servers in a rotating fashion. The first query goes to 192.168.1.10, the second to 192.168.1.11, the third back to 192.168.1.10, and so on.
  • prometheus :9153: Exposes Prometheus metrics on port 9153, allowing you to monitor CoreDNS performance and query distribution.
  • hosts { fallthrough }: This allows CoreDNS to resolve entries from a local /etc/hosts file if present, falling through to the forwarder if an entry isn’t found.
  • reload: This plugin enables CoreDNS to automatically reload its configuration when the Corefile changes, without requiring a restart.

The fundamental problem this solves is preventing a single point of failure and performance degradation in your DNS infrastructure. By distributing queries, you improve resilience and responsiveness. If one upstream server becomes unavailable, CoreDNS can continue to serve requests using the other available servers.

Internally, the forward plugin with a round_robin policy maintains an internal counter. Each time a query for a domain in the forward block is processed, CoreDNS increments this counter and selects the upstream server corresponding to the counter’s value modulo the number of upstream servers. This ensures an even distribution over time.

The levers you control are primarily the list of upstream servers and the policy directive. While round_robin is the most common for basic load balancing, other policies exist, such as random (which picks a server randomly for each query, offering a different kind of distribution) and sequential (which tries servers in order until one responds, often used for failover rather than true load balancing). The health_check option within the forward plugin can also be configured to periodically check the health of upstream servers and remove unresponsive ones from the rotation, further enhancing reliability.

When using the round_robin policy, the order in which you list the upstream servers in your Corefile matters for the initial distribution, but over a sustained period, the rotation ensures evenness. However, if you have upstream servers with vastly different capacities or network latencies, a simple round-robin might not be optimal. In such cases, you might need to consider more sophisticated load balancing strategies outside of CoreDNS itself, or perhaps implement custom logic if CoreDNS’s policies don’t meet your specific needs. It’s also worth noting that the cache plugin operates before the forward plugin. This means that if a record is served from the cache, the forward plugin (and its load balancing policy) is never invoked for that specific query, which is generally desirable for performance but means cache hits aren’t load-balanced.

The next concept to explore is how to implement weighted load balancing if your upstream servers have different capacities.

Want structured learning?

Take the full Coredns course →