Consul health checks aren’t just for telling you if a service is up; they’re how Consul decides what healthy instances to send traffic to and which ones to avoid.

Let’s see a basic HTTP health check in action. Imagine a simple web service running on localhost:8080. We want Consul to check if /health returns a 200 OK.

{
  "service": {
    "name": "my-web-app",
    "port": 8080,
    "check": {
      "http": "http://localhost:8080/health",
      "interval": "10s",
      "timeout": "1s"
    }
  }
}

When Consul registers this service, it will start polling http://localhost:8080/health every 10 seconds. If the request times out or returns a status code other than 200-299, Consul will mark the service instance as unhealthy.

This system is built around the idea of declarative health. You tell Consul what constitutes health, and it actively verifies it. This is crucial for service discovery and load balancing. If Consul doesn’t know a service is healthy, it won’t include it in DNS lookups or API queries for that service.

HTTP Health Checks

These are the most common. Consul makes an HTTP GET request to a specified URL.

  • http: The URL to check. This can be relative (e.g., /health) if the scheme, host, and port are defined elsewhere in the service registration, or absolute (e.g., http://localhost:8080/health).
  • interval: How often to perform the check. 30s for 30 seconds, 1m for 1 minute.
  • timeout: How long Consul will wait for a response. 1s is a good starting point.
  • method: Defaults to GET. You can specify POST, PUT, etc., and provide header and body if needed.
  • tls_skip_verify: Set to true if your health check endpoint uses TLS but you want to skip certificate validation (use with caution in production).

Example: A more robust check might POST to an endpoint:

{
  "http": "http://localhost:8080/api/v1/health",
  "method": "POST",
  "header": {
    "Content-Type": "application/json"
  },
  "body": "{\"status\": \"ping\"}",
  "interval": "15s",
  "timeout": "2s"
}

TCP Health Checks

Simpler than HTTP, TCP checks just establish a TCP connection to a given address and port. If the connection succeeds, the check passes. If it fails (e.g., connection refused, timeout), it fails.

  • tcp: The address and port to connect to, like localhost:5432 or 192.168.1.100:8000.
  • interval: Same as HTTP checks.
  • timeout: Same as HTTP checks.

Example: Checking if a database is listening:

{
  "service": {
    "name": "my-database",
    "port": 5432,
    "check": {
      "tcp": "localhost:5432",
      "interval": "20s",
      "timeout": "500ms"
    }
  }
}

This is useful for services where you don’t have a dedicated health endpoint or just need to confirm the listener is active.

Script Health Checks

These allow you to run arbitrary scripts on the Consul agent node. This is the most powerful but also the most complex, as you’re responsible for the script’s success/failure exit codes.

  • script: The command to execute. For example, /usr/local/bin/check_my_service.sh.
  • interval: Same as above.
  • timeout: Same as above. The script must exit within this time.

Example: A script that checks a custom metric:

#!/bin/bash
# /usr/local/bin/check_my_service.sh

if [ $(pgrep -c my-app-process) -eq 0 ]; then
  echo "My app process is not running."
  exit 2 # Critical failure
fi

# Simulate checking a queue depth
QUEUE_DEPTH=$(redis-cli -h localhost -p 6379 llen my_task_queue)
if [ "$QUEUE_DEPTH" -gt 100 ]; then
  echo "Queue depth is too high: $QUEUE_DEPTH"
  exit 1 # Warning
fi

echo "Service is healthy."
exit 0 # Success

The exit code of the script determines the check’s status:

  • 0: Success
  • 1: Warning
  • 2: Critical
  • Any other non-zero exit code is treated as critical.

Registration:

{
  "service": {
    "name": "my-critical-app",
    "port": 9000,
    "check": {
      "script": "/usr/local/bin/check_my_service.sh",
      "interval": "1m",
      "timeout": "5s"
    }
  }
}

Consul agents execute these scripts directly. Ensure the script has execute permissions and is in a location accessible by the Consul agent user.

The real power comes from combining these checks. You can register multiple checks for a single service. For instance, an HTTP check to verify the web server is responding, and a script check to ensure the background worker processes are active. Consul aggregates these statuses. If any check associated with a service instance is critical, Consul will mark that instance as unhealthy for routing purposes.

When you register a check, Consul’s internal state for that check is updated. This state is then used by the Consul cluster’s health checking system to determine the overall health of a service instance. If a service instance is marked as critical by a health check, it will be removed from the list of healthy nodes returned by Consul’s DNS or API for that service.

Want structured learning?

Take the full Consul course →