The Saga pattern lets you manage data consistency across multiple services without resorting to the performance-sucking, blocking nature of two-phase commit (2PC).

Imagine an e-commerce order. When a customer places an order, it triggers a cascade of events: reserve inventory, process payment, and ship the order. In a microservices world, these are handled by separate services. If the payment service fails after inventory is reserved, how do you roll back the inventory reservation without a coordinated transaction? That’s where Sagas shine.

Here’s a simplified order placement saga:

{
  "id": "order-123",
  "status": "PENDING",
  "items": [
    {"productId": "A1", "quantity": 2}
  ],
  "payment": {
    "status": "PENDING"
  },
  "inventory": {
    "status": "PENDING"
  }
}
  1. Order Service: Creates the order in PENDING state.
  2. Order Service: Sends a ReserveInventoryCommand to the Inventory Service.
  3. Inventory Service: Reserves inventory. Responds with InventoryReservedEvent.
  4. Order Service: Receives InventoryReservedEvent. Updates order to INVENTORY_RESERVED and sends ProcessPaymentCommand to Payment Service.
  5. Payment Service: Processes payment. Responds with PaymentProcessedEvent.
  6. Order Service: Receives PaymentProcessedEvent. Updates order to PAID and sends ShipOrderCommand to Shipping Service.
  7. Shipping Service: Ships the order. Responds with OrderShippedEvent.
  8. Order Service: Receives OrderShippedEvent. Updates order to COMPLETED.

This is a choreography-based saga: each service listens for events from others and triggers its own actions. No central orchestrator.

What if step 5 (Payment Service) fails?

  • Payment Service: Fails to process payment. Responds with PaymentFailedEvent.
  • Order Service: Receives PaymentFailedEvent. Updates order to PAYMENT_FAILED.
  • Order Service: Sends ReleaseInventoryCommand to the Inventory Service.
  • Inventory Service: Receives ReleaseInventoryCommand. Releases the previously reserved inventory. Responds with InventoryReleasedEvent.
  • Order Service: Receives InventoryReleasedEvent. Updates order to CANCELLED.

The saga is now complete, with inventory correctly released.

Alternatively, you can use an orchestration-based saga. A central orchestrator (often the service that initiated the transaction, like the Order Service) dictates the flow.

// Simplified Orchestrator Logic
public void handleOrderCreated(OrderCreatedEvent event) {
    String orderId = event.getOrderId();
    // 1. Reserve Inventory
    inventoryService.reserve(orderId, event.getItems());
}

public void handleInventoryReserved(InventoryReservedEvent event) {
    String orderId = event.getOrderId();
    // 2. Process Payment
    paymentService.process(orderId, event.getAmount());
}

public void handlePaymentProcessed(PaymentProcessedEvent event) {
    String orderId = event.getOrderId();
    // 3. Ship Order
    shippingService.ship(orderId);
}

public void handlePaymentFailed(PaymentFailedEvent event) {
    String orderId = event.getOrderId();
    // Compensation: Release Inventory
    inventoryService.release(orderId);
}

public void handleInventoryReleased(InventoryReleasedEvent event) {
    String orderId = event.getOrderId();
    // Final State: Cancel Order
    orderRepository.updateStatus(orderId, "CANCELLED");
}

The orchestrator explicitly tells each service what to do and handles compensation logic by invoking specific compensating actions.

The most surprising thing about Sagas is how they flip the error-handling model: instead of "abort and roll back," it’s "try the next step, and if that fails, try to undo the steps that succeeded." This is fundamentally different from ACID transactions.

Let’s look at the levers you control. In choreography, you control the events your service emits and the commands it listens for. You need a robust event bus (like Kafka or RabbitMQ) and clear event contracts. In orchestration, you control the state machine within your orchestrator, defining the sequence of commands and the logic for handling success and failure events from other services.

Consider the compensation step for releasing inventory. The ReleaseInventoryCommand doesn’t just decrement the stock count; it needs to know how much was reserved in the first place. This implies that the Inventory Service must store the reservation details (e.g., reserved_quantity) alongside the product ID, so it can accurately reverse the action. If it just decremented a total stock, reversing it would be impossible without knowing the original reservation.

A common pitfall is forgetting the "idempotency" requirement for both commands and compensating actions. If a ReserveInventoryCommand is sent twice by mistake, the Inventory Service should only reserve the inventory once. Similarly, if ReleaseInventoryCommand is sent twice, it should only release the inventory once. This is typically achieved by checking if the action has already been performed for a given transaction ID.

The next concept you’ll grapple with is how to handle long-running sagas where a step might take a very long time, potentially leading to resource locks or stale state.

Want structured learning?

Take the full Distributed Systems course →