RabbitMQ Streams, despite its name, is fundamentally a durable log, not a traditional message queue.
Let’s see it in action. Imagine we have a service producing events and another consuming them.
Producer side (Python example):
from rabbitmq_streams import Message, Producer
producer = Producer(
host="localhost",
port=5552,
virtual_host="/",
username="guest",
password="guest"
)
producer.connect()
for i in range(100):
message_data = f"My event data: {i}"
message = Message(body=message_data.encode(), properties={"content_type": "text/plain"})
producer.send(
stream="my-event-stream",
partition="0",
message=message
)
print(f"Sent message {i}")
producer.close()
Consumer side (Python example):
from rabbitmq_streams import Consumer
consumer = Consumer(
host="localhost",
port=5552,
virtual_host="/",
username="guest",
password="guest",
offset="beginning", # Start from the very first message
executor_threads=1
)
def message_handler(message):
print(f"Received: {message.body.decode()} | Offset: {message.offset}")
# In a real app, you'd process this message and potentially commit the offset
consumer.subscribe(
stream="my-event-stream",
partition="0",
handler=message_handler
)
print("Consumer started. Press Ctrl+C to stop.")
consumer.run()
This setup creates a stream named my-event-stream on partition 0. The producer sends 100 messages, and the consumer, starting from the beginning (offset 0), will dutifully print each one.
The core problem RabbitMQ Streams solves is enabling high-throughput, durable, and ordered messaging at scale, overcoming limitations of traditional message queues when dealing with massive event volumes. Unlike classic RabbitMQ queues which can struggle with disk I/O bottlenecks and message re-ordering under heavy load, Streams uses a segment-based, append-only log structure. Each stream is composed of multiple ordered segments on disk. When a segment fills up, a new one is created. Consumers track their progress by their offset within these segments. This design allows for sequential reads and writes, drastically improving throughput and predictability.
The key levers you control are:
- Stream Creation: Defining the stream name, number of partitions, and retention policies (how long data is kept).
- Partitioning Strategy: How you distribute messages across partitions. A single partition guarantees strict ordering for all messages within that stream, but limits throughput to a single writer and reader thread. Multiple partitions allow for parallel writes and reads, increasing throughput, but only guarantee ordering within a partition.
- Consumer Offsets: Where a consumer starts reading from (
beginning,end, or a specific offset). This is crucial for replayability and disaster recovery. - Durability Configuration: How data is written to disk and replicated. Streams offers different levels of durability, balancing performance with resilience.
- Producer/Consumer Behavior: How messages are sent (e.g., batching) and processed (e.g., committing offsets).
The most surprising aspect of RabbitMQ Streams, especially for users coming from traditional queues, is how it handles consumer acknowledgments. Instead of individual message acknowledgments, consumers typically commit their offset. This means the consumer tells the broker, "I have successfully processed all messages up to and including offset X." The broker then marks all messages up to that offset as "processed" for that specific consumer group. This is a more efficient way to manage progress in a log-based system, but it also means that if a consumer crashes after committing an offset but before fully processing the last message associated with that offset, that message is lost for that consumer group. The broker doesn’t "uncommit" offsets.
The next concept you’ll likely grapple with is implementing robust consumer groups and handling message processing failures without losing data, which often involves building retry mechanisms and dead-lettering strategies on top of the offset commitment model.