The most counterintuitive truth about achieving exactly-once event processing is that it doesn’t actually guarantee zero duplicates; it guarantees zero unacknowledged duplicates on the consumer side.
Let’s see this in action. Imagine a simple Kafka producer sending a message and a Kafka consumer processing it.
// Producer
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all"); // Crucial for durability
props.put("retries", Integer.MAX_VALUE); // Keep retrying indefinitely
props.put("enable.idempotence", "true"); // Producer-side idempotence
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
try {
ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1");
RecordMetadata metadata = producer.send(record).get(); // .get() makes it synchronous for demonstration
System.out.println("Sent: " + record.value() + " to partition " + metadata.partition() + " offset " + metadata.offset());
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
// Consumer
Properties consumerProps = new Properties();
consumerProps.put("bootstrap.servers", "localhost:9092");
consumerProps.put("group.id", "my-group");
consumerProps.put("enable.auto.commit", "false"); // Manual commits are key
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps);
consumer.subscribe(Arrays.asList("my-topic"));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("Processing: offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
// Simulate processing logic
// If processing is successful, commit the offset
consumer.commitSync(); // Commit *after* successful processing
}
}
} catch (WakeupException e) {
// Ignore for graceful shutdown
} finally {
consumer.close();
}
This setup looks like it should achieve exactly-once, but the devil is in the distributed system’s failure modes. The core problem is that "exactly-once" is a promise made by the consumer to the system (and by extension, the end-user). The system (Kafka, in this case) can guarantee that a message is delivered at least once. The consumer’s job is to ensure that even if it receives the same message multiple times due to network glitches or crashes, it only acts on that message once.
The mental model hinges on two primary mechanisms: producer idempotence and consumer transactional processing (or equivalent deduplication logic).
Producer Idempotence: This prevents the producer from writing the same message multiple times to Kafka. When enable.idempotence=true and acks=all are set on the producer, Kafka assigns a unique Producer ID (PID) and a sequence number to each message batch. If the producer retries sending a batch after a transient error, Kafka will detect the duplicate batch (using the PID and sequence number) and simply ignore it, returning the same offset as the original successful write. This handles producer-side retries gracefully.
Consumer Deduplication: This is where the real "exactly-once" magic happens from the consumer’s perspective. The consumer must commit its offset after successfully processing a message. If the consumer crashes after processing a message but before committing the offset, it will re-read that message upon restart. To prevent reprocessing, the consumer needs a way to know if it has already "acted" on that specific message.
Common strategies include:
- Idempotent Consumers with Unique Message IDs: Each message produced has a globally unique ID. The consumer maintains a cache (e.g., in Redis, a database, or even in-memory for short-lived consumers) of processed message IDs. Before processing a message, it checks if its ID is in the cache. If yes, it skips processing and commits the offset. If no, it processes the message, adds its ID to the cache, and then commits the offset.
- Transactional Consumers (Kafka Streams, KSQL, Flink): These frameworks often provide built-in exactly-once semantics. They achieve this by coordinating with Kafka’s transactional producer API. When a consumer processes a batch of records, it might write results to an external system and then produce new records back to Kafka, all within a single Kafka transaction. The transaction is only committed to Kafka if all steps succeed. If any step fails, the entire transaction is aborted, and the offsets are rolled back. This ensures that either all parts of the operation complete, or none of them do.
The critical piece most people miss is that the consumer’s offset commit is the acknowledgment that processing is complete. If the consumer crashes after processing but before committing, Kafka has no way of knowing the processing succeeded. So, upon restart, Kafka will redeliver the message. The consumer’s deduplication logic is what prevents this redelivery from causing duplicate side effects. The offset commit is the signal to Kafka that it can forget about delivering that specific message again.
The next concept you’ll grapple with is managing the state required for deduplication across consumer restarts and scaling events.