Circuit breakers in Java, when implemented with Resilience4j, don’t just prevent cascading failures; they actively manage the rate at which a service can recover its capacity.

Let’s watch this in action. Imagine a simple REST client that might occasionally fail. We’ll wrap it with a Resilience4j CircuitBreaker.

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.vavr.control.Try;

import java.time.Duration;
import java.util.function.Supplier;

public class CircuitBreakerExample {

    // Simulate a service that might fail
    public static String callExternalService() {
        if (Math.random() < 0.6) { // 60% chance of failure
            throw new RuntimeException("External service unavailable");
        }
        return "Success from external service";
    }

    public static void main(String[] args) {
        // Configure the circuit breaker
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50) // If 50% of calls fail, open the circuit
            .waitDurationInOpenState(Duration.ofSeconds(5)) // Stay open for 5 seconds
            .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
            .slidingWindowSize(10) // Look at the last 10 calls
            .recordExceptions(RuntimeException.class) // Record RuntimeExceptions as failures
            .build();

        CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
        CircuitBreaker circuitBreaker = registry.circuitBreaker("myService");

        // Create a decorated supplier
        Supplier<String> decoratedService = CircuitBreaker.decorateSupplier(circuitBreaker, CircuitBreakerExample::callExternalService);

        // Simulate calls
        System.out.println("--- Initializing ---");
        for (int i = 0; i < 15; i++) {
            Try<String> result = Try.ofSupplier(decoratedService);
            if (result.isSuccess()) {
                System.out.println("Call " + (i + 1) + ": " + result.get());
            } else {
                System.out.println("Call " + (i + 1) + ": FAILED - " + result.getCause().getMessage() + " (State: " + circuitBreaker.getState() + ")");
            }
            try {
                Thread.sleep(500); // Small delay between calls
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }

        System.out.println("\n--- Waiting for circuit to potentially close ---");
        try {
            Thread.sleep(6000); // Wait longer than waitDurationInOpenState
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        System.out.println("\n--- After waiting ---");
        for (int i = 0; i < 5; i++) {
            Try<String> result = Try.ofSupplier(decoratedService);
            if (result.isSuccess()) {
                System.out.println("Call " + (i + 1) + ": " + result.get());
            } else {
                System.out.println("Call " + (i + 1) + ": FAILED - " + result.getCause().getMessage() + " (State: " + circuitBreaker.getState() + ")");
            }
            try {
                Thread.sleep(500);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

When you run this, you’ll see calls failing, the circuit breaker opening (blocking further calls immediately with CircuitBreakerOpenException), and then after the waitDurationInOpenState, a few calls will be allowed through to test the waters. If those succeed, the circuit will close again.

The core problem circuit breakers solve is cascading failure. Without them, if a dependency (like a downstream service or database) starts failing, your service will keep trying to connect to it. Each of those failed attempts consumes resources (threads, network connections, memory) in your service. Eventually, your service runs out of resources and starts failing too, even for requests that don’t involve the problematic dependency. This failure can then cascade to its callers, and so on, bringing down an entire system. A circuit breaker acts as a protective shield, immediately failing fast when it detects a pattern of errors, preventing your service from drowning in its own failed attempts.

Internally, Resilience4j’s CircuitBreaker maintains a state machine: CLOSED, OPEN, and HALF_OPEN.

  • CLOSED: This is the normal state. Calls are allowed through to the underlying service. For each call, the breaker records whether it succeeded or failed (based on configured exceptions). It uses a sliding window (either time-based or count-based) to track recent outcomes. If the rate of failures within that window exceeds the failureRateThreshold, the breaker transitions to OPEN.
  • OPEN: In this state, all calls to the decorated service are immediately rejected with a CircuitBreakerOpenException. No calls are made to the actual service. This state persists for a waitDurationInOpenState. The purpose here is to give the failing dependency time to recover and to prevent your service from wasting resources on a known-bad connection.
  • HALF_OPEN: After the waitDurationInOpenState elapses, the breaker transitions to HALF_OPEN. In this state, it allows a single call to pass through to the service. If this single call succeeds, the breaker transitions back to CLOSED, assuming the dependency has recovered. If this single call fails, the breaker immediately transitions back to OPEN, starting the waitDurationInOpenState again. This "test call" prevents a sudden surge of traffic onto a potentially still-unstable service.

The key levers you control are:

  • failureRateThreshold: The percentage of failures within the sliding window that triggers the circuit to open. A value of 50 means if half of the calls in the window fail, it opens.
  • waitDurationInOpenState: How long the circuit stays OPEN before transitioning to HALF_OPEN. Duration.ofSeconds(5) means it waits 5 seconds.
  • slidingWindowType and slidingWindowSize: Defines how recent calls are considered for the failure rate calculation. COUNT_BASED with a size of 10 means the last 10 calls are analyzed. TIME_BASED would analyze calls within a specific duration, like 10 seconds.
  • recordExceptions: A list of exception types that should be treated as failures and contribute to the failure rate.

What most people don’t realize is that the slidingWindowSize and waitDurationInOpenState interact to determine how quickly the circuit breaker adapts to changing conditions. A small slidingWindowSize makes the breaker sensitive to short bursts of errors, while a long waitDurationInOpenState means it will remain unresponsive for a longer period even after the problem might have resolved. Tuning these requires understanding the typical error patterns and recovery times of your dependencies.

The next concept you’ll likely encounter is how to combine circuit breakers with other resilience patterns, like retries, to create more robust failure handling strategies.

Want structured learning?

Take the full Circuit-breaker course →