The most surprising thing about event-driven systems is how easily they can become a distributed monolith, locking you into tight coupling disguised as loose coupling.

Let’s look at a common scenario: a simple e-commerce order processing system.

Order Placed -> Payment Processed -> Inventory Updated -> Shipping Initiated

Here’s how it might look in code, with events flowing through a message broker like Kafka.

// Producer: Order Service
@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);
    eventPublisher.publishEvent(new OrderPlacedEvent(order.getId(), order.getCustomerId(), order.getTotalAmount()));
}

// Consumer: Payment Service
@KafkaListener(topics = "order-placed-events")
public void handleOrderPlaced(OrderPlacedEvent event) {
    paymentGateway.processPayment(event.getOrderId(), event.getCustomerId(), event.getTotalAmount());
    eventPublisher.publishEvent(new PaymentProcessedEvent(event.getOrderId(), true));
}

// Consumer: Inventory Service
@KafkaListener(topics = "payment-processed-events")
public void handlePaymentProcessed(PaymentProcessedEvent event) {
    inventoryManager.updateStock(event.getOrderId());
    eventPublisher.publishEvent(new InventoryUpdatedEvent(event.getOrderId()));
}

// Consumer: Shipping Service
@KafkaListener(topics = "inventory-updated-events")
public void handleInventoryUpdated(InventoryUpdatedEvent event) {
    shippingCarrier.initiateShipment(event.getOrderId());
    eventPublisher.publishEvent(new ShippingInitiatedEvent(event.getOrderId()));
}

This looks decoupled, right? Each service reacts to an event and publishes its own. But this is where the anti-patterns creep in.

The "Choreography of Death" (Circular Dependencies)

This happens when services create event loops. Service A publishes an event, Service B reacts and publishes an event, and Service C’s event triggers a reaction in Service A, creating an infinite loop.

Diagnosis: You’ll see logs showing the same sequence of events repeating endlessly. If your message broker supports it, you can inspect consumer group offsets and see them constantly re-processing the same messages without making progress.

Fix: Introduce a dedicated "workflow orchestrator" service. Instead of services publishing events that directly trigger the next step, they publish an event that the orchestrator consumes. The orchestrator then decides what the next logical step is and directly calls the relevant service or publishes a new event for the orchestrator to handle. For example, OrderPlacedEvent goes to the Orchestrator, which then calls PaymentService.processPayment() directly.

Why it works: The orchestrator becomes the single source of truth for the workflow’s state, breaking the direct, implicit dependency between services.

The "Event Storm" (Unbounded Event Streams)

Services publish too much information in their events, or events contain data that isn’t strictly necessary for the immediate consumer. This leads to consumers having to parse and filter large, complex event payloads, and makes it hard to evolve events without breaking consumers.

Diagnosis: Consumers are complex, with lots of if statements checking event types or data fields. Schema evolution becomes a nightmare, requiring coordinated deployments across many services.

Fix: Define clear, concise "domain events" that represent a single, significant state change. Each event should carry only the data necessary for its direct consumers. If a consumer needs more data, it should fetch it directly from the source service’s API or a dedicated read model. For instance, instead of OrderPlacedEvent(orderId, customerId, totalAmount, items[]), use OrderSummaryCreatedEvent(orderId, customerId, totalAmount). If the Shipping Service needs item details, it calls the Order Service API.

Why it works: It enforces a stricter contract and reduces the blast radius of changes. Consumers are less coupled to the internal details of the producer.

The "Eventual Consistency Illusion" (Lack of Idempotency and Transactional Guarantees)

Consumers assume that processing an event once is enough. But network glitches, broker restarts, or application crashes can lead to an event being delivered multiple times, or not at all.

Diagnosis: You see duplicate data being created (e.g., multiple shipments for one order), or data inconsistencies where an order is marked as paid but not updated in inventory. Debugging these issues is incredibly difficult because the exact sequence of failures is hard to reproduce.

Fix: Implement idempotency in all event consumers. This means processing an event multiple times has the same effect as processing it once. A common pattern is to use a unique event ID and store processed event IDs in a database or cache. Before processing, check if the event ID has already been processed. For critical operations, consider using transactional outbox patterns where event publishing is part of the same database transaction that updates the service’s state.

Why it works: Idempotency ensures that even with duplicate deliveries, the system state remains consistent. Transactional outbox prevents events from being published if the core business logic fails.

The "Black Box Broker" (Over-reliance on Broker Features)

Using complex broker features like Kafka Streams or ksqlDB for business logic instead of dedicated services. The broker becomes a de facto application server, making it hard to debug, test, and scale independently.

Diagnosis: Business logic is embedded directly within Kafka Streams applications or complex ksqlDB queries. Testing involves spinning up a Kafka cluster, making development slow and brittle.

Fix: Treat the message broker as a plumbing system for event delivery. Move complex business logic into dedicated microservices that consume events, perform actions, and publish new events. Use the broker for reliable event transport, not application logic.

Why it works: It keeps the broker focused on its core competency (message queuing) and allows services to be developed, tested, and scaled independently using familiar application development paradigms.

The "Schema Spaghetti" (No Schema Management)

Lack of a centralized schema registry or consistent schema evolution strategy. Different services use different versions of event schemas, leading to runtime errors when consumers expect one format and producers send another.

Diagnosis: DeserializationException errors in consumer logs, or unexpected null values for fields that should be present. Difficult to track which service produces which version of an event.

Fix: Implement a schema registry (like Confluent Schema Registry) and enforce schema compatibility rules (e.g., backward compatibility, forward compatibility). Define clear schemas using formats like Avro or Protobuf. All producers and consumers must interact with the registry to validate their schemas.

Why it works: It provides a single source of truth for event schemas, enabling safe schema evolution and preventing runtime deserialization failures.

The "Event-Driven Monolith" (Implicit Dependencies via Event Naming/Structure)

Services rely on specific event names or structures that are not explicitly documented or governed, creating implicit coupling. Changing an event name or payload in one service breaks others, even if they aren’t directly aware of the dependency.

Diagnosis: A seemingly small change in one service causes widespread, unexplained failures in seemingly unrelated services. Debugging involves tracing event flows through logs and trying to infer dependencies.

Fix: Establish clear guidelines for event naming conventions and payload structures. Use a schema registry (as mentioned above) to enforce these standards. Document event contracts thoroughly and treat them as public APIs. Regularly review event usage to identify and break implicit dependencies.

Why it works: Formalizing event contracts as APIs makes dependencies explicit and manageable, reducing the risk of accidental coupling.

The next thing you’ll likely encounter is the challenge of distributed tracing, as understanding the full lifecycle of a request across multiple event-driven services becomes critical for debugging.

Want structured learning?

Take the full Event-driven course →