The Saga pattern lets you manage data consistency across multiple services without resorting to the performance-sucking, blocking nature of two-phase commit (2PC).
Imagine an e-commerce order. When a customer places an order, it triggers a cascade of events: reserve inventory, process payment, and ship the order. In a microservices world, these are handled by separate services. If the payment service fails after inventory is reserved, how do you roll back the inventory reservation without a coordinated transaction? That’s where Sagas shine.
Here’s a simplified order placement saga:
{
"id": "order-123",
"status": "PENDING",
"items": [
{"productId": "A1", "quantity": 2}
],
"payment": {
"status": "PENDING"
},
"inventory": {
"status": "PENDING"
}
}
- Order Service: Creates the order in
PENDINGstate. - Order Service: Sends a
ReserveInventoryCommandto the Inventory Service. - Inventory Service: Reserves inventory. Responds with
InventoryReservedEvent. - Order Service: Receives
InventoryReservedEvent. Updates order toINVENTORY_RESERVEDand sendsProcessPaymentCommandto Payment Service. - Payment Service: Processes payment. Responds with
PaymentProcessedEvent. - Order Service: Receives
PaymentProcessedEvent. Updates order toPAIDand sendsShipOrderCommandto Shipping Service. - Shipping Service: Ships the order. Responds with
OrderShippedEvent. - Order Service: Receives
OrderShippedEvent. Updates order toCOMPLETED.
This is a choreography-based saga: each service listens for events from others and triggers its own actions. No central orchestrator.
What if step 5 (Payment Service) fails?
- Payment Service: Fails to process payment. Responds with
PaymentFailedEvent. - Order Service: Receives
PaymentFailedEvent. Updates order toPAYMENT_FAILED. - Order Service: Sends
ReleaseInventoryCommandto the Inventory Service. - Inventory Service: Receives
ReleaseInventoryCommand. Releases the previously reserved inventory. Responds withInventoryReleasedEvent. - Order Service: Receives
InventoryReleasedEvent. Updates order toCANCELLED.
The saga is now complete, with inventory correctly released.
Alternatively, you can use an orchestration-based saga. A central orchestrator (often the service that initiated the transaction, like the Order Service) dictates the flow.
// Simplified Orchestrator Logic
public void handleOrderCreated(OrderCreatedEvent event) {
String orderId = event.getOrderId();
// 1. Reserve Inventory
inventoryService.reserve(orderId, event.getItems());
}
public void handleInventoryReserved(InventoryReservedEvent event) {
String orderId = event.getOrderId();
// 2. Process Payment
paymentService.process(orderId, event.getAmount());
}
public void handlePaymentProcessed(PaymentProcessedEvent event) {
String orderId = event.getOrderId();
// 3. Ship Order
shippingService.ship(orderId);
}
public void handlePaymentFailed(PaymentFailedEvent event) {
String orderId = event.getOrderId();
// Compensation: Release Inventory
inventoryService.release(orderId);
}
public void handleInventoryReleased(InventoryReleasedEvent event) {
String orderId = event.getOrderId();
// Final State: Cancel Order
orderRepository.updateStatus(orderId, "CANCELLED");
}
The orchestrator explicitly tells each service what to do and handles compensation logic by invoking specific compensating actions.
The most surprising thing about Sagas is how they flip the error-handling model: instead of "abort and roll back," it’s "try the next step, and if that fails, try to undo the steps that succeeded." This is fundamentally different from ACID transactions.
Let’s look at the levers you control. In choreography, you control the events your service emits and the commands it listens for. You need a robust event bus (like Kafka or RabbitMQ) and clear event contracts. In orchestration, you control the state machine within your orchestrator, defining the sequence of commands and the logic for handling success and failure events from other services.
Consider the compensation step for releasing inventory. The ReleaseInventoryCommand doesn’t just decrement the stock count; it needs to know how much was reserved in the first place. This implies that the Inventory Service must store the reservation details (e.g., reserved_quantity) alongside the product ID, so it can accurately reverse the action. If it just decremented a total stock, reversing it would be impossible without knowing the original reservation.
A common pitfall is forgetting the "idempotency" requirement for both commands and compensating actions. If a ReserveInventoryCommand is sent twice by mistake, the Inventory Service should only reserve the inventory once. Similarly, if ReleaseInventoryCommand is sent twice, it should only release the inventory once. This is typically achieved by checking if the action has already been performed for a given transaction ID.
The next concept you’ll grapple with is how to handle long-running sagas where a step might take a very long time, potentially leading to resource locks or stale state.