The most surprising thing about RabbitMQ consumer prefetch is that setting it too high can actually decrease your overall throughput, not increase it.

Let’s see what that looks like in practice. Imagine a scenario where you have a single producer sending messages to a queue, and a single consumer processing them.

# Producer (simplified)
import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='my_queue')

for i in range(10000):
    channel.basic_publish(exchange='', routing_key='my_queue', body=f'message_{i}')

connection.close()
# Consumer (simplified, prefetch = 1)
import pika
import time

def callback(ch, method, properties, body):
    print(f" [x] Received {body.decode()}")
    # Simulate work
    time.sleep(0.1)
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='my_queue')

channel.basic_qos(prefetch_count=1) # Low prefetch

channel.basic_consume(queue='my_queue', on_message_callback=callback)

print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()

In this setup, with prefetch_count=1, the consumer receives one message, processes it (taking 0.1 seconds), acknowledges it, and then asks for the next one. This is slow.

Now, let’s bump that prefetch_count to 1000.

# Consumer (simplified, prefetch = 1000)
import pika
import time

def callback(ch, method, properties, body):
    print(f" [x] Received {body.decode()}")
    # Simulate work
    time.sleep(0.1)
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='my_queue')

channel.basic_qos(prefetch_count=1000) # High prefetch

channel.basic_consume(queue='my_queue', on_message_callback=callback)

print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()

With prefetch_count=1000, the consumer asks for 1000 messages at once. It starts processing them. This looks faster initially because the consumer is actively working on many messages. However, if the consumer fails to acknowledge a message, or if the consumer crashes, all 1000 messages in its prefetch buffer will be redelivered. The broker has to re-queue them and send them out again. This can lead to a lot of wasted work and retransmissions, especially if your processing time is variable or your consumers are prone to errors.

The problem RabbitMQ is solving here is how to balance the desire for high throughput (keeping the consumer busy) with fault tolerance and efficient resource utilization. basic_qos(prefetch_count=N) is the mechanism for this. It tells RabbitMQ how many unacknowledged messages a consumer can have outstanding.

When a consumer calls basic_qos(prefetch_count=N), it’s essentially saying, "RabbitMQ, I can handle up to N messages concurrently. Don’t send me more than that until I acknowledge the ones I’ve received."

The prefetch_count affects how many messages are "in flight" between the broker and the consumer. A prefetch of 1 means the consumer gets one message, processes it, acknowledges it, and then asks for the next. This is highly reliable but slow. A high prefetch means the consumer gets a batch of messages and processes them. This can be much faster if the consumer is reliable and messages are processed quickly.

The optimal prefetch_count is highly dependent on your specific workload. It’s a tuning parameter. You need to consider:

  • Message processing time: If messages take a long time to process, a lower prefetch is better to avoid holding too many messages hostage if a consumer fails.
  • Consumer reliability: If your consumers are stable and rarely crash, you can afford a higher prefetch.
  • Network latency: High latency can make it beneficial to fetch more messages at once to keep the consumer fed.
  • Memory: A very high prefetch can consume significant memory on the consumer side if messages are large.

Crucially, prefetch_count is applied per consumer. If you have multiple consumers on the same queue, each one can have its own prefetch_count. The total number of unacknowledged messages across all consumers for that queue will influence the broker’s behavior regarding flow control.

When you set prefetch_count to a value greater than 1, the consumer starts receiving messages proactively. It doesn’t wait for an acknowledgement before requesting the next message, up to the limit specified. This allows the consumer to work on multiple messages in parallel without explicitly managing concurrency itself. The broker, in turn, will not deliver more than prefetch_count messages to a single consumer that have not yet been acknowledged. This is a form of flow control managed by RabbitMQ.

The interaction between basic_qos and basic_ack is fundamental. A consumer must basic_ack a message to signal to RabbitMQ that it has been successfully processed. Until that basic_ack is received, the message remains "unacknowledged" from the broker’s perspective. If a consumer disconnects or crashes before acknowledging a message, RabbitMQ will redeliver that message to another available consumer. Setting prefetch_count too high can lead to a situation where, upon a consumer failure, a large number of messages are immediately redelivered, potentially overwhelming other consumers or causing a thundering herd problem if not handled carefully.

The next step after tuning prefetch is often optimizing message acknowledgments, specifically understanding the difference between basic_ack and basic_nack or basic_reject when failures do occur and you need to signal that a message could not be processed.

Want structured learning?

Take the full Amqp course →