Express applications often become brittle in production, failing catastrophically under load or when downstream services misbehave.
Here’s how we’re going to add circuit breakers to an Express app using the opossum library to prevent this cascading failure.
First, let’s imagine we have a service that fetches user data. This is our "downstream" service that we want to protect.
// user-service.js
const fetch = require('node-fetch');
async function getUserData(userId) {
const response = await fetch(`https://api.example.com/users/${userId}`);
if (!response.ok) {
throw new Error(`Failed to fetch user ${userId}: ${response.statusText}`);
}
return response.json();
}
module.exports = { getUserData };
Now, in our Express app, we’ll integrate opossum to wrap this potentially flaky function.
// app.js
const express = require('express');
const opossum = require('opossum');
const { getUserData } = require('./user-service');
const app = express();
// Configure the circuit breaker
const userCircuitBreaker = new opossum(getUserData, {
// How many successful calls must occur for the circuit to close again
resetTimeout: 30000, // 30 seconds
// How many failures trigger the circuit to open
failureThreshold: 5,
// How long the circuit stays open before attempting a half-open state
timeout: 3000, // 3 seconds
// The function to call when the circuit is open.
// This is our fallback logic.
fallback: (userId) => {
console.warn(`Circuit breaker open for user ${userId}. Falling back to cached/default data.`);
return { id: userId, name: 'Default User', status: 'fallback' };
},
// Optional: Log events for debugging
onCircuitOpen: () => console.error('CIRCUIT OPENED!'),
onCircuitClose: () => console.log('CIRCUIT CLOSED.'),
onCircuitHalfOpen: () => console.log('CIRCUIT HALF-OPEN. Testing connection.'),
onFallback: (args) => console.warn('Fallback triggered for:', args),
});
app.get('/users/:id', async (req, res) => {
const userId = req.params.id;
try {
// Execute the wrapped function
const userData = await userCircuitBreaker.fire(userId);
res.json(userData);
} catch (error) {
console.error('Error after circuit breaker:', error.message);
res.status(500).send('An error occurred while fetching user data.');
}
});
const port = 3000;
app.listen(port, () => {
console.log(`Server listening on port ${port}`);
});
When getUserData throws an error (or times out) five times within a short period (controlled by timeout and failureThreshold), the circuit breaker will "open." For the next 30 seconds (resetTimeout), any subsequent calls to userCircuitBreaker.fire(userId) will not execute getUserData. Instead, they will immediately invoke the fallback function, returning a predefined or cached response. This prevents the Express app from hammering a failing downstream service, allowing it time to recover. After 30 seconds, the circuit moves to a "half-open" state, allowing a single request to pass through. If that request succeeds, the circuit closes. If it fails, it opens again.
The key parameters here are failureThreshold, resetTimeout, and fallback. failureThreshold dictates how quickly the breaker trips. resetTimeout defines the cool-down period before attempting to re-establish a connection. The fallback function is your safety net, providing a graceful degradation of service when the primary path is unavailable. Without a fallback, requests to an open circuit would still throw an error, albeit a much faster one.
The opossum library handles the state machine (closed, open, half-open) and the timing internally. You simply wrap your asynchronous function with it and call .fire() instead of the original function. The fallback function receives the same arguments that were passed to .fire().
This setup is crucial for distributed systems where network latency, service outages, and resource contention are common. By isolating failures and providing a fallback, you build more resilient applications that don’t bring down the entire system when one component stutters.
The next thing you’ll want to think about is how to manage the circuit breaker’s configuration itself, especially if your downstream services have varying reliability or different performance characteristics.