Consul doesn’t just expose metrics; it actively talks to Prometheus and pushes them.
Let’s see Consul metrics in action. Imagine you have a Consul cluster running, and you’ve configured it to expose metrics. Here’s what you might see in Prometheus’s Prometheus UI, querying for Consul-related metrics:
consul_raft_state{state="leader"} 1
consul_serf_member_status{status="alive",member="consul-server-1"} 1
consul_acl_token_count 15
This shows Consul’s internal state (like who’s the Raft leader), the health of its peers, and even operational stats like the number of ACL tokens.
The problem Consul’s metrics-to-Prometheus feature solves is giving you deep visibility into the health and performance of your service discovery and configuration system. Without this, troubleshooting Consul itself becomes a black box. Internally, Consul uses a combination of its own internal statistics gathering and exposes them via an HTTP endpoint. Prometheus then scrapes this endpoint.
Here’s the breakdown of how it works:
-
Consul Agent Configuration: You need to enable the metrics endpoint on your Consul agents (both server and client nodes). This is done in the Consul configuration file (e.g.,
consul.jsonor via command-line flags).{ "server": true, "datacenter": "dc1", "node_name": "consul-server-1", "bind_addr": "192.168.1.10:8300", "client_addr": "192.168.1.10:8500", "advertise_addr": "192.168.1.10:8300", "retry_join": ["192.168.1.11", "192.168.1.12"], "ports": { "http": 8500, "rpc": 8300, "serf_lan": 8301, "serf_wan": 8302, "server": 8300 }, "enable_syslog": true, "log_level": "INFO", "metrics_mode": "prometheus" }The key here is
"metrics_mode": "prometheus". This tells Consul to start an HTTP server on theclient_addr(by default, port 8500) and expose metrics in a Prometheus-readable format. If you’re not using a JSON config file, you can use the-metrics-mode=prometheusflag when startingconsul agent. -
Prometheus Scrape Configuration: Next, you configure Prometheus to scrape this new endpoint. In your
prometheus.ymlfile, you’ll add a scrape job for Consul.scrape_configs: - job_name: 'consul' static_configs: - targets: ['consul-server-1:8500', 'consul-server-2:8500', 'consul-server-3:8500']This tells Prometheus to poll
http://consul-server-X:8500/metricsfor each server in your Consul cluster. If you have Consul clients that you also want to monitor, you’d add theirclient_addr:8500to this list or create a separate job. -
Metrics Exposed: Consul exposes a rich set of metrics. You’ll find metrics related to:
- Raft Consensus:
consul_raft_state,consul_raft_log_index,consul_raft_commit_index. These are critical for understanding the health and leadership of your Consul servers. - Serf (Gossip Protocol):
consul_serf_member_status,consul_serf_lan_member_count,consul_serf_wan_member_count. These show the health and membership of your Consul cluster. - RPC and HTTP API:
consul_rpc_request_duration_seconds,consul_http_request_duration_seconds. These help you understand latency and load on your Consul API. - Catalog and Services:
consul_catalog_service_count,consul_service_registration_duration_seconds. Metrics about your registered services. - ACLs:
consul_acl_token_count.
- Raft Consensus:
The metrics_mode setting in Consul can actually take values other than prometheus. Setting it to gauge will expose metrics in a simple key-value format, while histogram and summary provide more detailed timing distributions. However, prometheus is the most commonly used and provides the format Prometheus expects out-of-the-box.
A detail often missed is that the /metrics endpoint is served by the Consul agent’s HTTP API, meaning it respects ACLs if they are enabled. If Prometheus cannot scrape metrics, the first thing to check is if an ACL token with sufficient read permissions for the /metrics endpoint is being provided in the Prometheus scrape configuration via the consul_token parameter for that job.
Once configured, you can start building dashboards in Grafana or use Prometheus’s own expression browser to alert on Consul’s health, performance, and operational status.
The next step in leveraging Consul’s observability is often to configure Consul’s event bus to emit custom events that can also be scraped or processed by other systems.