Kafka is fundamentally a distributed commit log, not a message queue, and that difference is the key to its ability to handle event streams at scale.
Let’s see this in action. Imagine a simple e-commerce system. When a customer places an order, that’s an "order placed" event. In an event-driven architecture, this event doesn’t just update a database; it’s published to a Kafka topic, say orders.
// Producer sending an order event
{
"event_id": "abc-123",
"timestamp": "2023-10-27T10:00:00Z",
"event_type": "ORDER_PLACED",
"payload": {
"order_id": "ORD7890",
"customer_id": "CUST456",
"items": [
{"product_id": "PROD001", "quantity": 2},
{"product_id": "PROD005", "quantity": 1}
],
"total_amount": 150.75
}
}
Now, multiple downstream services can consume this event independently. An inventory service might decrement stock, a shipping service might start the fulfillment process, and a billing service might generate an invoice. Each of these is a Kafka consumer, reading from the orders topic.
// Consumer reading from the 'orders' topic (simplified Java example)
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Arrays.asList("orders"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.printf("Received message: offset = %d, key = %s, value = %s%n",
record.offset(), record.key(), record.value());
// Process the order event...
}
}
The magic here is that Kafka doesn’t "consume" the message in the traditional sense. It appends events to a log. Consumers track their own position (offset) within that log. This allows:
- Multiple Consumers: Any number of consumers can read the same event stream without interfering with each other. They just read from their respective offsets.
- Replayability: If a new service needs to process historical orders, it can simply start reading from the beginning of the
orderstopic. - Durability: Kafka replicates partitions across multiple brokers, ensuring data is not lost even if a broker fails.
The core problem Kafka solves is decoupling event producers from event consumers. Instead of point-to-point communication where producers need to know about all consumers and manage delivery, producers simply write to Kafka topics. Consumers subscribe to topics they are interested in. This drastically simplifies complex systems, making them more resilient and scalable.
The mental model to hold is that Kafka is a distributed, append-only log. Topics are logs, partitions are ordered segments within those logs, and consumers read sequentially through partitions, managing their own progress via offsets. Brokers store these partitions, replicate them for fault tolerance, and serve read/write requests. Producers write messages (events) to specific partitions, and consumers read them.
What most people don’t realize is how Kafka’s consumer group mechanism works with partitions to provide both scalability and fault tolerance. When multiple consumers belong to the same group.id, Kafka ensures that each partition within a topic is consumed by only one consumer within that group at any given time. If a consumer crashes, Kafka rebalances the partitions among the remaining consumers in the group. This automatic distribution of work is what enables massive horizontal scaling for consumption.
The next concept to grapple with is exactly how Kafka guarantees ordering, and why it’s only guaranteed within a partition, not across partitions of the same topic.