Consul’s prepared queries are more like intelligent, stateful DNS records than simple DNS queries, capable of actively managing service discovery across multiple datacenters.
Let’s see this in action. Imagine you have a web application front-end (webapp) in dc1 that needs to talk to an API service (api-service) which is deployed in both dc1 and dc2. You want webapp in dc1 to prefer api-service instances in dc1, but if none are healthy, it should automatically fail over to api-service instances in dc2.
First, ensure your api-service is registered in both datacenters with Consul. This is standard service registration, nothing fancy.
Now, you create a prepared query in dc1 that targets the api-service and specifies a failover datacenter.
{
"Name": "api-service-failover",
"Service": "api-service",
"Datacenter": "dc1",
"Failover": {
"Datacenters": [
{"Datacenter": "dc2", "Near": 100}
],
"ReplicaFactor": 1
},
"SolveOrder": ["mesh"]
}
When you register this prepared query:
consul query register --name api-service-failover --service api-service --datacenter dc1 --failover-datacenter dc2 --failover-replica-factor 1 --solve-order mesh
Consul treats this registered query as a special kind of service. You can then query it like a regular service from dc1:
dig @<consul_agent_ip> api-service-failover.service.consul SRV
The magic happens in how Consul resolves this. If api-service instances are healthy in dc1, Consul will return those. If all api-service instances in dc1 are unhealthy (based on their Consul health checks), Consul will then look at the Failover configuration. It checks dc2 for healthy api-service instances and returns those. The Near: 100 in the Failover configuration indicates a preference for datacenters closer to the originating datacenter (though in this simple example with only two, it’s less impactful than if you had many). ReplicaFactor: 1 means it will try to return at least one healthy instance from the failover datacenter. SolveOrder: ["mesh"] tells Consul to prioritize services within the same datacenter (mesh) before considering failover datacenters.
The core problem this solves is providing resilient service discovery for distributed applications. Without prepared queries, you’d need to manually configure your clients to know about multiple datacenters and implement complex failover logic within your application code. This approach centralizes that intelligence within Consul, making your applications simpler and more robust.
Internally, Consul doesn’t just perform a simple DNS lookup. When a prepared query is executed, Consul actively evaluates the health of the target service in the specified datacenters. It consults its internal state for service health, not just DNS records. The Failover block is a declarative way to tell Consul about secondary locations and their priority.
The SolveOrder parameter is crucial for fine-grained control. While mesh is common, you can also use fallback to prioritize datacenters listed in the Failover block before checking the originating datacenter, or extended which allows for more complex cross-datacenter behavior.
A common point of confusion is that prepared queries are registered and then queried. You don’t query the prepared query definition directly; you query a DNS name that Consul maps to the execution of that prepared query. The name you query is [prepared_query_name].service.consul.
The next concept you’ll likely encounter is how to manage health checks for services that span multiple datacenters, particularly ensuring that health checks themselves are resilient and don’t become a single point of failure.