Build a Distributed Queue: Design, Trade-offs, and Failure Modes (2026)

A distributed queue doesn’t actually queue anything; it’s a distributed log that enables queue-like behavior with surprising flexibility.

Let’s see it in action. Imagine we have a Kafka cluster with two brokers, broker-1:9092 and broker-2:9092. We want to create a topic named orders with 3 partitions, and we want each partition to be replicated across both brokers, with broker-1 being the leader for partitions 0 and 1, and broker-2 being the leader for partition 2.

# Create the topic with 3 partitions and a replication factor of 2
kafka-topics --bootstrap-server broker-1:9092 --create --topic orders --partitions 3 --replication-factor 2

# Verify the topic configuration
kafka-topics --bootstrap-server broker-1:9092 --describe --topic orders

The output would look something like this:

Topic: orders   PartitionCount: 3       ReplicationFactor: 2    Configs:
        Topic: orders   Partition: 0    Leader: 1       Replicas: 1,2     Isr: 1,2
        Topic: orders   Partition: 1    Leader: 1       Replicas: 1,2     Isr: 1,2
        Topic: orders   Partition: 2    Leader: 2       Replicas: 2,1     Isr: 2,1

Here, Leader is the broker currently serving read/write requests for that partition. Replicas are all the brokers that should hold a copy of the data. Isr (In-Sync Replicas) are the replicas that are currently caught up with the leader. If Isr count drops below the replication-factor, Kafka considers the partition under-replicated and can reduce availability to prevent data loss.

This ability to specify partitions and replication factors is key. It allows us to distribute the load of message processing (by having different consumers read from different partitions) and ensure durability (by having multiple copies of the data). The log structure means messages are appended sequentially, and consumers track their position (offset) within each partition. This is fundamentally different from traditional message queues where messages are typically removed once consumed.

The problem this solves is building a highly available, scalable message bus that can handle massive throughput and guarantee message delivery even in the face of failures. Traditional single-node queues become bottlenecks and single points of failure. Distributed queues, by leveraging distributed consensus and replication, overcome these limitations.

Internally, each partition is an append-only log. Producers write messages to the leader broker for a partition. The leader appends the message to its local log and then forwards it to all in-sync replicas. Once a quorum of replicas (typically (replication-factor / 2) + 1) acknowledges the write, the leader confirms the write to the producer. Consumers then read from the leader, committing their offsets as they process messages. If a leader fails, a new leader is elected from the in-sync replicas, allowing the system to continue operating with minimal disruption.

The exact levers you control are partitions, replication-factor, acks for producers, and min.insync.replicas for brokers. partitions determine parallelism. replication-factor dictates fault tolerance. acks (0, 1, all) control the durability guarantees for producers: 0 means fire-and-forget (fastest, least durable), 1 means leader acknowledges (decent durability), all means leader and quorum of replicas acknowledge (most durable, slowest). min.insync.replicas (often set to replication-factor - 1) is a broker-side setting that ensures a partition is only considered available if at least this many replicas are in sync.

What most people don’t realize is that a partition leader always handles all reads and writes for its partition. Replicas only serve data if the leader is unavailable or if you explicitly configure them to do so (which is rare for typical queue workloads). This simplifies consistency but means the leader can become a bottleneck if a single partition is extremely hot.

The next problem you’ll run into is handling message ordering across partitions, which Kafka doesn’t guarantee by default.