A circuit breaker doesn’t just stop requests; it actively rejects them to prevent cascading failures.
Imagine a service, let’s call it OrderService, that needs to call another service, PaymentService, to process a payment. If PaymentService becomes slow or unresponsive, OrderService will start hanging onto its threads, waiting for PaymentService to return. This can quickly exhaust OrderService’s resources, making it unable to serve any requests, even those not involving PaymentService. This is a cascading failure.
A circuit breaker, when implemented between OrderService and PaymentService, acts like an electrical circuit breaker. It monitors the calls from OrderService to PaymentService.
Here’s how it looks in practice, using a common library like Resilience4j in Java:
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // When 50% of calls fail, trip the breaker
.waitDurationInOpenState(Duration.ofSeconds(10)) // Stay open for 10 seconds
.slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED) // Use a count-based window
.slidingWindowSize(100) // Look at the last 100 calls
.recordExceptions(
ConnectTimeoutException.class,
ReadTimeoutException.class,
HttpServerErrorException.class // Custom exception for 5xx errors
)
.build();
CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.of(circuitBreakerConfig);
// Get the circuit breaker instance for the "paymentService"
CircuitBreaker paymentServiceCircuitBreaker = circuitBreakerRegistry.circuitBreaker("paymentService");
// Decorate the call to PaymentService
Supplier<PaymentResponse> paymentCall = () ->
restTemplate.postForObject("http://payment-service/process", paymentRequest, PaymentResponse.class);
try {
PaymentResponse response = CircuitBreaker.decorateSupplier(paymentServiceCircuitBreaker, paymentCall).get();
// Process successful payment
} catch (CircuitBreakerOpenException e) {
// Circuit breaker is open, handle gracefully (e.g., return cached data, inform user)
log.warn("Payment service is unavailable. Circuit breaker is open.");
throw new ServiceUnavailableException("Payment processing is temporarily unavailable.");
} catch (Exception e) {
// Other exceptions (e.g., network issues, unexpected responses)
log.error("Error processing payment: {}", e.getMessage());
throw new PaymentProcessingException("Failed to process payment.");
}
In this example, OrderService is configured to call PaymentService. The paymentServiceCircuitBreaker is set up to:
- Trip (open) if 50% of the last 100 calls to
PaymentServicefail. - Stay open for 10 seconds.
- Record specific exceptions like connection timeouts, read timeouts, and HTTP 5xx errors as failures.
When the circuit breaker is open, any subsequent calls from OrderService to PaymentService will be immediately rejected by the circuit breaker itself, without even attempting to contact PaymentService. This rejection typically throws a CircuitBreakerOpenException (or a similar specific exception depending on the library).
This immediate rejection is crucial. Instead of OrderService threads getting blocked waiting for a failing PaymentService, they are freed up almost instantly. This allows OrderService to continue processing other requests that don’t depend on the failing PaymentService, preserving its overall health and availability.
After the waitDurationInOpenState (10 seconds in this case), the circuit breaker enters a "half-open" state. It will allow a single request to PaymentService to pass through. If this single request succeeds, the circuit breaker closes again. If it fails, the breaker immediately re-opens for another full waitDurationInOpenState. This allows the service to automatically recover once the downstream dependency is healthy again.
The key parameters you tweak are the failureRateThreshold, waitDurationInOpenState, and slidingWindowSize. A lower failureRateThreshold (e.g., 30%) makes the breaker trip more easily, protecting your service faster from a struggling dependency. A longer waitDurationInOpenState means you’ll wait longer before attempting to re-establish the connection, which can be useful if the downstream service is expected to be down for a while. A smaller slidingWindowSize means the failure rate is calculated over a more recent, smaller set of calls, making the breaker more responsive to immediate issues.
The most surprising thing is that the circuit breaker itself can become a performance bottleneck if its internal state management or exception handling is inefficient, especially under extremely high load where it’s constantly opening and closing.
The next problem you’ll run into is how to gracefully handle the CircuitBreakerOpenException in your calling service.