A distributed transaction is not a single atomic operation; it’s a series of independent, observable operations that appear atomic to the user.

Let’s say you have a microservices architecture: an Order service, a Payment service, and an Inventory service. When a user places an order, it’s not just a single database write. It involves:

  1. Order Service: Creates an order, marking it as PENDING.
  2. Payment Service: Authorizes the payment for the order.
  3. Inventory Service: Reserves the items for the order.

If any of these steps fail, the entire transaction must be rolled back. In a monolithic application, this is handled by ACID transactions. In a distributed system, ACID is impractical. This is where the Saga pattern comes in.

Saga choreography is a decentralized approach to managing these distributed transactions. Instead of a central orchestrator telling each service what to do, services communicate directly with each other by emitting and reacting to events.

Imagine this flow:

  • User places an order.
  • Order Service creates the order (status: PENDING) and publishes an OrderCreated event.
  • Payment Service listens for OrderCreated events. Upon receiving one, it attempts to authorize payment.
    • If successful, it publishes a PaymentAuthorized event.
    • If it fails, it publishes a PaymentFailed event.
  • Inventory Service also listens for OrderCreated events. Upon receiving one, it attempts to reserve inventory.
    • If successful, it publishes an InventoryReserved event.
    • If it fails, it publishes an InventoryReservationFailed event.

Now, for the rollback (compensation):

  • If Payment Service publishes PaymentAuthorized, Inventory Service (or whoever is next in line) might fail. If Inventory Service publishes InventoryReservationFailed, the Payment Service needs to know to cancel the authorization. So, Inventory Service would publish an InventoryReservationFailed event.
  • Payment Service listens for InventoryReservationFailed. Upon receiving it, it performs a compensation action: refunding the payment. It then publishes a PaymentRefunded event.
  • Order Service listens for PaymentFailed (from Payment Service) or InventoryReservationFailed (from Inventory Service). If it receives either, it marks the order as CANCELLED. If it receives PaymentRefunded (as a consequence of other failures), it also marks the order as CANCELLED.

This creates a chain reaction of events and corresponding compensation events.

Here’s a simplified Kafka-based example configuration for this:

Order Service (Producer/Consumer)

kafka:
  bootstrap-servers: kafka.example.com:9092
  producer:
    acks: all # Guarantees message durability
  consumer:
    group-id: order-service-group
    auto-offset-reset: earliest
    topics:
      - payment-events
      - inventory-events

When an order is created: kafkaTemplate.send("order-events", orderId, new OrderCreatedEvent(orderId, amount));

Payment Service (Producer/Consumer)

kafka:
  bootstrap-servers: kafka.example.com:9092
  producer:
    acks: all
  consumer:
    group-id: payment-service-group
    auto-offset-reset: earliest
    topics:
      - order-events # Listens for order creation

When OrderCreatedEvent is received: // Attempt payment if (paymentSuccessful) { kafkaTemplate.send("payment-events", orderId, new PaymentAuthorizedEvent(orderId)); } else { kafkaTemplate.send("payment-events", orderId, new PaymentFailedEvent(orderId)); }

When InventoryReservationFailedEvent is received: // Compensate payment kafkaTemplate.send("payment-events", orderId, new PaymentRefundedEvent(orderId));

Inventory Service (Producer/Consumer)

kafka:
  bootstrap-servers: kafka.example.com:9092
  producer:
    acks: all
  consumer:
    group-id: inventory-service-group
    auto-offset-reset: earliest
    topics:
      - order-events # Listens for order creation

When OrderCreatedEvent is received: // Attempt to reserve inventory if (reservationSuccessful) { kafkaTemplate.send("inventory-events", orderId, new InventoryReservedEvent(orderId)); } else { kafkaTemplate.send("inventory-events", orderId, new InventoryReservationFailedEvent(orderId)); }

This event-driven, decentralized approach is the essence of Saga choreography. Each service is only concerned with its own local transaction and the events it needs to publish or react to. The complexity of the overall distributed transaction is distributed across the services.

The biggest challenge with choreography is tracing the overall state of a transaction. When something goes wrong, it can be difficult to pinpoint which service is lagging or misbehaving without a clear, centralized view.

The next complexity you’ll run into is handling idempotent event consumers, ensuring that processing the same event multiple times doesn’t cause duplicate actions or incorrect state changes.

Want structured learning?

Take the full Event-driven course →