DNS SRV records are the unsung heroes of modern service discovery, letting clients find specific services running on arbitrary hosts without needing to know their exact IP addresses or ports.

Imagine a fleet of microservices, constantly spinning up and down, changing IP addresses like a chameleon changes colors. How does a web frontend know which IP address and port to talk to for the user-service? It doesn’t. Instead, it asks DNS for a special record: an SRV record.

Let’s say our frontend needs to find the user-service. It queries DNS for _user-service._tcp.internal.example.com. The DNS server, instead of returning an A record, returns something like this:

_user-service._tcp.internal.example.com. 300 IN SRV 10 5 8080 user-service-1.internal.example.com.
_user-service._tcp.internal.example.com. 300 IN SRV 20 5 8080 user-service-2.internal.example.com.

This tells the client:

  • _user-service._tcp.internal.example.com: The service being queried.
  • 300: The Time To Live (TTL) for this record, in seconds.
  • IN SRV: It’s a standard SRV record.
  • 10 and 20: The priority. Lower numbers are preferred. If user-service-1 is available, use it. If not, try user-service-2.
  • 5: The weight. If multiple records have the same priority (e.g., two 10s), the weight determines the probability of selection. So, user-service-1 is twice as likely to be chosen as user-service-2 if they both had priority 10.
  • 8080: The port the service is listening on.
  • user-service-1.internal.example.com.: The hostname of the actual service instance.

The client then takes the returned hostname (user-service-1.internal.example.com.) and performs a separate A record lookup to get its IP address.

This entire process is orchestrated by your DNS server. For BIND, you’d configure this in your zone file like so:

$TTL 3600
@       IN      SOA     ns1.example.com. admin.example.com. (
                        2023010101 ; serial
                        3600       ; refresh
                        1800       ; retry
                        604800     ; expire
                        86400      ; minimum TTL
                        )

; Name servers
@       IN      NS      ns1.example.com.
@       IN      NS      ns2.example.com.

; Service records for user-service
_user-service._tcp.internal   IN SRV 10 5 8080 user-service-1.internal
_user-service._tcp.internal   IN SRV 10 5 8080 user-service-2.internal
_user-service._tcp.internal   IN SRV 20 5 8080 user-service-3.internal

; A records for the service instances
user-service-1.internal       IN A    192.168.1.10
user-service-2.internal       IN A    192.168.1.11
user-service-3.internal       IN A    192.168.1.12

Notice how the SRV record itself doesn’t contain the IP address, only the hostname and port. This is by design, decoupling the service endpoint from its underlying network address.

The real power comes when this is integrated with dynamic environments. Tools like Kubernetes or Consul automatically register and de-register SRV records as pods or services come online and go offline. For instance, in Kubernetes, a Service object can be configured to export SRV records for its endpoints, making them discoverable by external clients or other services.

This allows a client application to simply ask for _my-api._tcp.my-namespace.svc.cluster.local and receive a list of healthy instances. If an instance fails, its corresponding SRV record (or the underlying endpoint it points to) is removed, and the client is automatically routed to the remaining healthy instances.

A subtle but crucial aspect of SRV records is how clients should behave. A compliant client will perform the SRV lookup, iterate through the returned records, and attempt to connect to the highest priority, lowest weight available service. If a connection attempt fails (e.g., connection refused, timeout), the client should then try the next record in the list, respecting priority and weight. This load balancing and failover happens at the client level, based on the DNS response.

When you encounter SERVFAIL or NXDOMAIN for an SRV record, it usually means the service isn’t registered in DNS at all, or the DNS server itself is having trouble.

The next step is exploring how to secure these service-to-service communications, often by leveraging the service discovery information to implement mutual TLS.

Want structured learning?

Take the full Dns course →