A distributed transaction is not a single atomic operation; it’s a series of independent, observable operations that appear atomic to the user.
Let’s say you have a microservices architecture: an Order service, a Payment service, and an Inventory service. When a user places an order, it’s not just a single database write. It involves:
- Order Service: Creates an order, marking it as
PENDING. - Payment Service: Authorizes the payment for the order.
- Inventory Service: Reserves the items for the order.
If any of these steps fail, the entire transaction must be rolled back. In a monolithic application, this is handled by ACID transactions. In a distributed system, ACID is impractical. This is where the Saga pattern comes in.
Saga choreography is a decentralized approach to managing these distributed transactions. Instead of a central orchestrator telling each service what to do, services communicate directly with each other by emitting and reacting to events.
Imagine this flow:
- User places an order.
- Order Service creates the order (
status: PENDING) and publishes anOrderCreatedevent. - Payment Service listens for
OrderCreatedevents. Upon receiving one, it attempts to authorize payment.- If successful, it publishes a
PaymentAuthorizedevent. - If it fails, it publishes a
PaymentFailedevent.
- If successful, it publishes a
- Inventory Service also listens for
OrderCreatedevents. Upon receiving one, it attempts to reserve inventory.- If successful, it publishes an
InventoryReservedevent. - If it fails, it publishes an
InventoryReservationFailedevent.
- If successful, it publishes an
Now, for the rollback (compensation):
- If Payment Service publishes
PaymentAuthorized, Inventory Service (or whoever is next in line) might fail. If Inventory Service publishesInventoryReservationFailed, the Payment Service needs to know to cancel the authorization. So, Inventory Service would publish anInventoryReservationFailedevent. - Payment Service listens for
InventoryReservationFailed. Upon receiving it, it performs a compensation action: refunding the payment. It then publishes aPaymentRefundedevent. - Order Service listens for
PaymentFailed(from Payment Service) orInventoryReservationFailed(from Inventory Service). If it receives either, it marks the order asCANCELLED. If it receivesPaymentRefunded(as a consequence of other failures), it also marks the order asCANCELLED.
This creates a chain reaction of events and corresponding compensation events.
Here’s a simplified Kafka-based example configuration for this:
Order Service (Producer/Consumer)
kafka:
bootstrap-servers: kafka.example.com:9092
producer:
acks: all # Guarantees message durability
consumer:
group-id: order-service-group
auto-offset-reset: earliest
topics:
- payment-events
- inventory-events
When an order is created:
kafkaTemplate.send("order-events", orderId, new OrderCreatedEvent(orderId, amount));
Payment Service (Producer/Consumer)
kafka:
bootstrap-servers: kafka.example.com:9092
producer:
acks: all
consumer:
group-id: payment-service-group
auto-offset-reset: earliest
topics:
- order-events # Listens for order creation
When OrderCreatedEvent is received:
// Attempt payment
if (paymentSuccessful) { kafkaTemplate.send("payment-events", orderId, new PaymentAuthorizedEvent(orderId)); } else { kafkaTemplate.send("payment-events", orderId, new PaymentFailedEvent(orderId)); }
When InventoryReservationFailedEvent is received:
// Compensate payment
kafkaTemplate.send("payment-events", orderId, new PaymentRefundedEvent(orderId));
Inventory Service (Producer/Consumer)
kafka:
bootstrap-servers: kafka.example.com:9092
producer:
acks: all
consumer:
group-id: inventory-service-group
auto-offset-reset: earliest
topics:
- order-events # Listens for order creation
When OrderCreatedEvent is received:
// Attempt to reserve inventory
if (reservationSuccessful) { kafkaTemplate.send("inventory-events", orderId, new InventoryReservedEvent(orderId)); } else { kafkaTemplate.send("inventory-events", orderId, new InventoryReservationFailedEvent(orderId)); }
This event-driven, decentralized approach is the essence of Saga choreography. Each service is only concerned with its own local transaction and the events it needs to publish or react to. The complexity of the overall distributed transaction is distributed across the services.
The biggest challenge with choreography is tracing the overall state of a transaction. When something goes wrong, it can be difficult to pinpoint which service is lagging or misbehaving without a clear, centralized view.
The next complexity you’ll run into is handling idempotent event consumers, ensuring that processing the same event multiple times doesn’t cause duplicate actions or incorrect state changes.