Circuit breakers are surprisingly not about preventing network failures, but about preventing cascading application failures when downstream services are struggling.

Let’s say you have an API Gateway that routes requests to several microservices: users-service, orders-service, and payments-service.

# Example API Gateway Configuration (simplified)
services:
  users:
    url: http://localhost:8081
    circuitBreaker:
      enabled: true
      failureThreshold: 5
      resetTimeout: 10s
      # ... other settings
  orders:
    url: http://localhost:8082
    circuitBreaker:
      enabled: true
      failureThreshold: 5
      resetTimeout: 10s
  payments:
    url: http://localhost:8083
    circuitBreaker:
      enabled: true
      failureThreshold: 5
      resetTimeout: 10s

When a client makes a request to the gateway, say for user data, the gateway forwards it to users-service. If users-service becomes slow or unresponsive, instead of the gateway just retrying endlessly and consuming more resources, a circuit breaker intervenes.

Initially, the circuit breaker is in a CLOSED state. It allows requests to pass through to users-service. For every request that returns an error (like a 5xx status code or a timeout), the breaker increments a counter. If this counter reaches the failureThreshold (e.g., 5 failures), the breaker "trips" and moves to the OPEN state.

In the OPEN state, the circuit breaker immediately rejects any new requests destined for users-service without even attempting to send them. It returns an error to the client (often a 503 Service Unavailable error from the gateway itself). This prevents the gateway from further overloading the already struggling users-service and frees up its own resources.

After a resetTimeout (e.g., 10 seconds) has passed, the circuit breaker transitions to a HALF-OPEN state. In this state, it allows a single request to pass through to users-service. If this request succeeds (returns a success status code), the circuit breaker assumes the downstream service has recovered and returns to the CLOSED state, allowing all subsequent requests. If the request fails again, the breaker immediately trips back to OPEN, starting the timeout period anew.

This cycle of CLOSED -> OPEN -> HALF-OPEN -> CLOSED (on success) or OPEN (on failure) is the core of how circuit breakers protect your system. They provide a graceful way to handle temporary service degradation, preventing a small problem from spiraling into a complete outage.

The failureThreshold isn’t just about HTTP status codes; it often includes connection errors, read timeouts, and other network-level failures that indicate the downstream service isn’t responding as expected. The resetTimeout is a critical tuning parameter: too short, and you might reopen the circuit before the service is truly healthy; too long, and you’ll keep users waiting unnecessarily.

Most circuit breaker implementations, like Resilience4j or Hystrix (though older), allow you to configure what constitutes a "failure." This can include specific HTTP status codes, exceptions thrown by the underlying HTTP client, or even custom predicates you define. For instance, you might decide that a 404 Not Found from a critical service is not a circuit-breaking event, but a 500 Internal Server Error absolutely is.

The actual implementation involves wrapping the HTTP client calls made by the gateway with the circuit breaker logic. When the gateway needs to call users-service, it doesn’t directly use httpClient.get("http://localhost:8081/users"). Instead, it uses circuitBreaker.execute(() -> httpClient.get("http://localhost:8081/users")). The circuitBreaker object manages the state transitions and decides whether to let the httpClient.get call proceed.

A common mistake is to think that the resetTimeout is when the service will be healthy again. It’s merely the time after which the gateway will test if the service is healthy. The actual recovery time of the downstream service is independent of the circuit breaker’s timeout.

The next concept you’ll grapple with is how to implement fallbacks when a circuit breaker is open, allowing your API to return cached data or a default response instead of just an error.

Want structured learning?

Take the full Circuit-breaker course →