A Consul debug bundle isn’t just a collection of logs; it’s a point-in-time snapshot of your Consul cluster’s internal state, designed to help you diagnose complex issues without impacting your running services.
Let’s see what a debug bundle actually contains by generating one and peeking inside. Imagine you’ve got a Consul cluster running on three servers and a couple of clients.
First, you need to generate the bundle. On any Consul agent (server or client), you’ll run:
consul debug snapshot -kind=bundle -output=consul-debug-bundle.tar.gz
This command tells Consul to gather a comprehensive snapshot and package it into a gzipped tarball named consul-debug-bundle.tar.gz.
Once it’s generated, you can extract it to see its contents:
tar -xzvf consul-debug-bundle.tar.gz
Inside, you’ll find a directory structure. The most critical parts are:
logs/: This directory contains logs from all Consul agents that were reachable when the bundle was generated. You’ll see files likeconsul.logand potentially others depending on your Consul configuration. These are crucial for understanding the sequence of events leading up to an incident.state/: This is a treasure trove of Consul’s internal state.state/raft/: Contains the Raft log and state for each server. This tells you how consensus is being reached (or failing to reach) for critical cluster operations. Files likeraft.datandstable.datare here.state/serf/: Holds Serf event logs and member information. This is how Consul knows about other agents in the cluster. You’ll findevents.jsonandmembers.json.state/sessions/: Information about active Consul sessions, which are vital for distributed locking and leader election.state/kv/: A snapshot of the Consul KV store at the time of the bundle generation. This is important if your application relies heavily on the KV store.
config/: The configuration files of the Consul agents that contributed to the bundle. This is essential for verifying that all agents are running with consistent and correct settings. You’ll see files likeconsul.jsonor similar.sys/: System-level information from the host machines, such asnetstatoutput,ifconfigorip aresults, andps auxoutput. This helps correlate Consul issues with underlying network or process problems.
The true power of the debug bundle lies in its ability to capture the distributed nature of Consul. When you have a problem like a service not registering, or a leader election failing, you’re not just looking at one machine’s logs. The bundle brings together the state and logs from all participating agents, allowing you to see how information flows (or doesn’t flow) between them.
For example, if you’re investigating why a new service isn’t showing up, you’d look for:
- Client Logs (
logs/): Is the client agent even running and successfully joining the cluster? Are there any errors in its logs related to service registration? - Server Raft State (
state/raft/): Are the servers healthy and able to reach consensus? A stalled Raft log on a server might indicate network partitions or overloaded servers. - Server Serf State (
state/serf/): Are all agents (including the one trying to register the service) visible to the servers? If a client is marked asleftorfailedinmembers.json, that’s a strong clue. - Client Configuration (
config/): Is the client configured correctly to point to the Consul servers?
The most surprising thing about a debug bundle is how much of Consul’s internal communication and state is captured. You’re not just getting a dump of files; you’re getting the raw data that Consul uses to maintain its distributed state. For instance, the state/serf/events.json file contains a log of all membership events (joins, leaves, failures) that the Serf gossip protocol has processed. Seeing a sudden influx of failed events here, correlated with your service registration issue, immediately points to a network problem or widespread agent instability.
Consider a scenario where a Consul server is unreachable from a client. The client’s logs will show connection errors, but the debug bundle from a server might show that it’s not receiving heartbeats from that specific client in its state/serf/members.json or that the client is marked as failed. The bundle allows you to trace the "broken" communication path from both ends simultaneously.
If you’re troubleshooting a leader election failure, you’ll examine the state/raft/ directory on the servers. The Raft log (raft.dat) is a sequence of operations that the cluster has agreed upon. If this log is truncated or shows repeated failed attempts to commit a leadership change, it indicates a deep consensus problem, often rooted in network partitions or disk I/O issues on the servers. The sys/ directory’s iostat or diskutil output can then help you confirm if disk performance is the bottleneck.
Finally, when you’ve resolved your initial issue and are running consul debug snapshot -kind=bundle again, you might encounter a new error if the Consul agent is configured with a very old TLS certificate that is now expired and preventing it from communicating with other agents.