Semaphore-based circuit breakers are the unsung heroes of distributed systems, not by stopping errors, but by actively allowing them in controlled bursts to prevent cascading failures.

Let’s watch a simple circuit breaker in action. Imagine two services, client and server. The client calls the server repeatedly. We’ll set up a circuit breaker on the client that wraps the calls to server.

import requests
import time
from pybreaker import CircuitBreaker

# Assume server is running on http://localhost:5000
def call_server():
    try:
        response = requests.get("http://localhost:5000/data")
        response.raise_for_status() # Raise an exception for bad status codes
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error calling server: {e}")
        raise # Re-raise the exception for the circuit breaker

# Configure a circuit breaker:
# reset_timeout: How long to wait before attempting a half-open state (seconds)
# exclude: Exceptions that should NOT trip the breaker
# exclude_types: Exception types that should NOT trip the breaker
cb = CircuitBreaker(
    fail_max=5,          # Number of failures before tripping
    reset_timeout=10,    # Seconds to wait before attempting to reset
    exclude=[KeyError]   # Don't trip on KeyError
)

# Decorate the function that makes the call
@cb
def protected_call_server():
    return call_server()

# Simulate calls
for i in range(15):
    try:
        print(f"Attempt {i+1}: Calling server...")
        result = protected_call_server()
        print(f"  Success: {result}")
    except Exception as e:
        print(f"  Call failed: {type(e).__name__}")
    time.sleep(1)

    if i == 6: # Simulate server going down after 7 attempts
        print("\n--- Simulating server failure ---")

    if i == 12: # Simulate server coming back up
        print("\n--- Simulating server recovery ---")

If you were to run this, you’d see the initial calls succeed. Then, after a few failures (simulating the server being unavailable), the circuit breaker would trip. Subsequent calls would immediately raise a CircuitBreakerError without even attempting to contact the server. After the reset_timeout (10 seconds), the breaker would transition to a half-open state, allowing a single call. If that call succeeds, the breaker closes; if it fails, it opens again.

The core problem circuit breakers solve is cascading failure. In a microservices architecture, a single service becoming slow or unavailable can exhaust resources (like connection pools or threads) on the services that depend on it. Those dependent services then become slow or unavailable, impacting their callers, and so on, creating a domino effect that can bring down the entire system. A circuit breaker acts as a protective layer, preventing a failing service from overwhelming its clients.

Internally, a circuit breaker operates in one of three states:

  • Closed: The default state. Requests are allowed through to the protected service. If the number of failures exceeds fail_max, the breaker transitions to the Open state.
  • Open: Requests are immediately rejected with a CircuitBreakerError without invoking the protected service. After reset_timeout seconds, the breaker transitions to the Half-Open state.
  • Half-Open: A limited number of requests (typically one) are allowed through. If this request succeeds, the breaker transitions back to Closed. If it fails, it immediately returns to Open.

The "semaphore-based" part refers to how many of these breakers you might have. You don’t just have one breaker for your whole application. Instead, you’d typically have a distinct circuit breaker instance for each critical dependency your service has. This granular approach ensures that a failure in one dependency doesn’t affect calls to other, healthy dependencies. For example, if your UserService calls both OrderService and PaymentService, you’d have one circuit breaker for OrderService calls and another for PaymentService calls.

When you are configuring a circuit breaker, the reset_timeout is crucial. A value too short might mean the breaker opens and closes too rapidly, not giving the downstream service enough time to recover. A value too long means clients remain blocked from accessing a potentially recovered service for an extended period. The optimal value is often found through load testing and observing system behavior under stress.

The most surprising thing about circuit breakers is that they are fundamentally about optimistic failure. Instead of trying to prevent every single error from happening (which is often impossible in distributed systems), they embrace the inevitability of failure and focus on gracefully degrading service and preventing small issues from becoming catastrophic ones. They allow a controlled amount of failure to occur, rather than a complete shutdown.

The subtle but powerful mechanism of exclude and exclude_types allows you to fine-tune which exceptions will contribute to tripping the circuit. For instance, a KeyError on a local dictionary lookup within your service is a programming error, not a network or downstream service failure. You wouldn’t want this to cause your circuit breaker to open and block all subsequent calls. By excluding specific exception types, you ensure the breaker only reacts to genuine external service issues.

Once you’ve mastered circuit breakers, the next logical step is to think about fallback mechanisms. What should your application do when a circuit breaker is open? Should it return a cached response, a default value, or simply an error message indicating temporary unavailability?

Want structured learning?

Take the full Circuit-breaker course →