The circuitbreaker library in Python doesn’t just prevent repeated calls to failing services; it actively uses a staggered retry strategy that can mask underlying issues if not understood.
Let’s see it in action. Imagine you have a flaky external API you need to call.
import requests
import circuitbreaker
import time
# Simulate a flaky service
def flaky_service(fail_rate=0.7):
import random
if random.random() < fail_rate:
raise requests.exceptions.ConnectionError("Service unavailable")
return "Service responded successfully"
# Initialize a circuit breaker
# The default settings: reset_timeout=10s, fail_max=5
cb = circuitbreaker.CircuitBreaker()
@cb
def call_flaky_service():
return flaky_service()
# Simulate calling the service multiple times
for i in range(15):
try:
print(f"Attempt {i+1}: {call_flaky_service()}")
except circuitbreaker.CircuitBreakerError as e:
print(f"Attempt {i+1}: Circuit breaker open! {e}")
except requests.exceptions.ConnectionError as e:
print(f"Attempt {i+1}: Service failed directly: {e}")
time.sleep(1)
If you run this, you’ll observe the CircuitBreakerError appearing after a few ConnectionErrors. After 10 seconds (the reset_timeout), it will allow a single "half-open" attempt. If that succeeds, the breaker closes. If it fails, it opens again.
The core problem circuitbreaker solves is the "fail fast" principle applied to distributed systems. When a service is down or consistently failing, repeatedly hammering it with requests is not only pointless but can exacerbate the problem. A circuit breaker acts like an electrical circuit breaker: if too much current (failed requests) is detected, it "trips" and stops all further current flow (requests) to that circuit (service) for a period. This gives the downstream service time to recover without being overwhelmed, and it prevents the upstream service from wasting resources on futile attempts.
Internally, the circuitbreaker library maintains a state for each decorated function: "closed," "open," or "half-open."
- Closed: Requests are allowed through. The breaker tracks the number of consecutive failures. If
fail_max(default 5) is reached, it transitions to "open." - Open: Requests are immediately rejected with a
CircuitBreakerError. Afterreset_timeout(default 10 seconds) has elapsed, it transitions to "half-open." - Half-Open: A single request is allowed through. If it succeeds, the breaker transitions back to "closed." If it fails, it transitions back to "open."
The fail_max parameter controls how many consecutive failures trigger the open state. A lower number means it trips faster, protecting the service more aggressively but potentially leading to more false positives if the service is just temporarily slow. The reset_timeout dictates how long the breaker stays open before attempting a recovery. A shorter timeout means faster recovery when the service is back up, but a longer timeout gives the service more breathing room.
A common misconception is that the circuitbreaker library fixes the underlying service. It doesn’t. It merely manages the interaction with the service. Your application logic still needs to handle the CircuitBreakerError and decide what to do: display a friendly message, use cached data, or queue the request for later. The library provides the mechanism to detect and react to service unreliability, not the solution to the unreliability itself.
The circuitbreaker library also allows you to customize the failure criteria beyond just exceptions. You can provide a predicate function that inspects the return value of a successful call and decides if it should be treated as a failure. For example, if your API returns a 500 status code but doesn’t raise an exception, you can still trip the breaker.
def check_response(response):
# Treat non-2xx status codes as failures
return not (200 <= response.status_code < 300)
# Assuming your service returns a requests.Response object
# cb_with_predicate = circuitbreaker.CircuitBreaker(predicate=check_response)
# @cb_with_predicate
# def call_api_with_status_check():
# return requests.get("http://flaky.example.com")
This allows for much finer-grained control over what constitutes a "failure" from the perspective of the circuit breaker, making it a powerful tool for building resilient systems.
The next logical step after implementing circuit breakers is to consider how to handle the CircuitBreakerError itself, perhaps by implementing a fallback mechanism or a robust retry strategy after the breaker has closed.