A surprising truth about event-driven systems is that, by design, they are inherently unreliable.
Let’s see what that means in practice. Imagine a simple e-commerce system where an OrderPlaced event triggers two downstream handlers: one to update inventory and another to send a confirmation email.
// Event Bus (e.g., Kafka, RabbitMQ)
{
"eventType": "OrderPlaced",
"payload": {
"orderId": "ORD12345",
"customerId": "CUST9876",
"items": [
{"productId": "PROD001", "quantity": 2},
{"productId": "PROD002", "quantity": 1}
]
}
}
The inventory handler receives this event and decrements stock for PROD001 by 2 and PROD002 by 1. The email handler receives it and sends an email to CUST9876.
Now, what happens if the event bus, due to a network glitch or a brief service hiccup, decides to deliver that same OrderPlaced event twice?
Without idempotency, the inventory handler would try to decrement stock again. This could lead to negative inventory, a serious problem. The email handler might send a duplicate confirmation email, annoying the customer.
The core problem is that event handlers, by default, execute their logic every time they receive an event. If the same event arrives multiple times, the logic runs multiple times, potentially causing unintended side effects.
To make handlers idempotent, we need a mechanism to ensure that even if an event is delivered multiple times, the handler’s core logic only executes once. The most common pattern is to leverage a unique identifier within the event itself.
Here’s how we can modify the OrderPlaced event to include a unique eventId:
// Event Bus
{
"eventId": "a1b2c3d4-e5f6-7890-1234-567890abcdef", // Unique ID for this specific event
"eventType": "OrderPlaced",
"payload": {
"orderId": "ORD12345",
"customerId": "CUST9876",
"items": [
{"productId": "PROD001", "quantity": 2},
{"productId": "PROD002", "quantity": 1}
]
}
}
The event handler then needs to track which eventIds it has already successfully processed. A common way to do this is by storing processed eventIds in a persistent store, like a database table or a dedicated cache (e.g., Redis).
The Idempotency Check:
- Receive Event: The handler receives the
OrderPlacedevent. - Extract
eventId: It pulls out theeventId(e.g.,a1b2c3d4-e5f6-7890-1234-567890abcdef). - Check History: It queries the persistent store to see if this
eventIdhas been processed before. - If Processed: If the
eventIdis found in the history, the handler immediately returns without executing its core logic. This is the idempotency in action – it effectively ignores the duplicate. - If Not Processed: If the
eventIdis not found, the handler proceeds to execute its core logic (update inventory, send email). - Record Success: After the core logic has successfully completed, the handler records the
eventIdin the persistent store. This ensures that any future duplicate deliveries of the same event will be ignored.
Implementation Details (Conceptual):
Let’s say we’re using PostgreSQL for our history store. We’d have a table like:
CREATE TABLE processed_events (
event_id UUID PRIMARY KEY,
processed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
The handler logic would look something like this (pseudo-code):
def handle_order_placed(event):
event_id = event['eventId']
order_payload = event['payload']
# 1. Check if already processed
if event_id_exists_in_db(event_id):
log(f"Event {event_id} already processed. Skipping.")
return
try:
# 2. Execute core business logic
update_inventory(order_payload['items'])
send_confirmation_email(order_payload['customerId'], order_payload['orderId'])
# 3. Record success IF logic completed without error
record_event_as_processed(event_id)
log(f"Successfully processed event {event_id}")
except Exception as e:
log(f"Error processing event {event_id}: {e}")
# Do NOT record as processed if an error occurred
# The event will be retried by the message broker
The key is the atomic check-and-insert (or a separate check and then insert). Many message queue clients offer built-in mechanisms for acknowledging messages. You only acknowledge a message after you’ve successfully processed it and recorded its eventId. If your handler crashes mid-processing, the message broker will eventually redeliver it, and your idempotency check will handle the retry.
One crucial aspect is how you handle the "record success" step. It must happen after the core logic completes successfully. If you record the eventId before executing the logic, and the logic fails, you’ll have incorrectly marked the event as processed, and it will never be retried. This is why the try...except block is essential, with the recording of success only happening after the try block’s code has run without raising an exception.
Another point of failure to consider is the idempotency store itself. If your processed_events table becomes unavailable, your handlers will start processing duplicate events. Ensure this store is highly available and performant. For very high throughput, using a distributed cache like Redis with its SETNX (set if not exists) command can be more performant than a traditional database for the eventId check.
The most subtle issue arises when event handlers depend on other events that might not have been processed yet, or if the order of operations between multiple handlers for the same event matters. If Handler A relies on a state updated by Handler B, and Handler B experiences a delay or failure, Handler A might proceed with stale data. This is less about the delivery of a single event and more about the consistency of the system state across multiple event handlers. For complex dependencies, you might need more sophisticated state management or consider patterns like sagas to coordinate multi-step transactions.