The erratic plugin for CoreDNS allows you to simulate intermittent DNS resolution failures, crucial for testing the resilience of applications that rely on DNS.

Let’s see erratic in action. Imagine you have a simple CoreDNS configuration file, Corefile:

.:53 {
    erratic {
        rcode SERVFAIL 50%
        rcode NXDOMAIN 25%
        response 10s 10%
    }
    forward . 8.8.8.8
}

In this setup, CoreDNS is running on port 53, and it’s configured to use the erratic plugin before forwarding requests to Google’s DNS server.

Here’s what each erratic directive means:

  • rcode SERVFAIL 50%: This tells erratic to randomly return a SERVFAIL (Server Failure) error for 50% of DNS queries. A SERVFAIL usually indicates a problem with the DNS server itself, not necessarily that the domain doesn’t exist.
  • rcode NXDOMAIN 25%: For 25% of queries, erratic will return an NXDOMAIN (Non-Existent Domain) error. This signifies that the domain name requested does not exist.
  • response 10s 10%: This directive introduces a delay. For 10% of queries, erratic will wait for 10 seconds before returning a response (or an error, as dictated by the other rules). This simulates network latency or a slow-to-respond upstream server.

Now, let’s simulate some DNS lookups using dig against this CoreDNS instance (assuming CoreDNS is running on localhost):

dig google.com @localhost

You might see output like this:

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> google.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 12345
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; Query time: 10005 msec  <-- Note the ~10 second delay
;; SERVER: 127.0.0.1#53 (127.0.0.1)
;; WHEN: Tue Feb 27 10:30:05 UTC 2024
;; MSG SIZE  rcvd: 56

Here, the query took about 10 seconds and returned SERVFAIL, exactly as configured by our erratic plugin. If you run dig google.com @localhost again, you might get a different result:

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> google.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 67890
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		300	IN	A	142.250.184.142

;; Query time: 5 msec
;; SERVER: 127.0.0.1#53 (127.0.0.1)
;; WHEN: Tue Feb 27 10:30:10 UTC 2024
;; MSG SIZE  rcvd: 85

This time, the query resolved successfully with a low query time, demonstrating the intermittent nature of the simulated failures.

The erratic plugin is incredibly useful for building robust systems. It allows you to proactively test how your applications behave when DNS is not a reliable constant. This includes testing retry mechanisms, timeout handling, and graceful degradation strategies. You can simulate a variety of failure modes:

  • rcode <RCODE> <percentage>%: Generates a specific DNS response code (like SERVFAIL, NXDOMAIN, REFUSED) for a given percentage of queries.
  • response <duration> <percentage>%: Introduces a delay of specified duration for a percentage of queries. This is great for testing how clients handle slow DNS responses.
  • panic <percentage>%: Causes CoreDNS to panic for a given percentage of queries, effectively crashing the server. This is useful for testing high-availability setups and failover.
  • drop <percentage>%: Silently drops a percentage of queries, meaning no response is sent back to the client. This simulates packet loss.

When configuring erratic, the order of directives matters. Later directives are applied to queries that were not affected by earlier ones. For instance, if a query is already handled by rcode SERVFAIL 50%, it won’t be subject to the response 10s 10% rule.

One of the most subtle yet powerful aspects of erratic is its ability to simulate transient failures. Unlike simply blocking DNS traffic, erratic injects realistic error conditions that applications should be able to recover from. This allows for testing of sophisticated retry logic and circuit breaker patterns. For example, if an application retries a DNS query after a SERVFAIL, it might succeed on the second attempt, mimicking a temporary network blip or a brief restart of an upstream DNS resolver.

By understanding and utilizing the erratic plugin, you can move beyond assuming perfect DNS and build applications that are resilient to the realities of network instability.

The next step after simulating flaky DNS is often to test how your applications handle caching when DNS records are intermittently unavailable.

Want structured learning?

Take the full Coredns course →