Cassandra’s concurrent_reads and concurrent_writes settings are not about how many operations your application can send at once, but how many in-flight operations the coordinator node will handle for a given request type.
Let’s see this in action. Imagine a simple SELECT query.
{
"query": "SELECT * FROM my_keyspace.my_table WHERE id = 123",
"parameters": {
"consistency_level": "QUORUM",
"read_timeout_ms": 5000,
"serial_consistency_level": null
}
}
When this request hits a coordinator node, say node A, and the consistency_level is QUORUM, node A needs to talk to a subset of replicas for that data. If concurrent_reads is set to 32, node A can have up to 32 different read requests in flight to its replicas at the same time. It doesn’t mean it can process 32 requests from your application simultaneously. It means it can service one application request by talking to multiple replicas, and while those are in flight, it can start servicing another application request by talking to its replicas, up to the concurrent_reads limit.
The core problem these settings solve is preventing a single node from becoming a bottleneck when it’s acting as a coordinator for many requests, especially under high load or when network latency is a factor. If a coordinator has to wait for one replica to respond before it can even start talking to another replica for the same request, throughput plummets. concurrent_reads and concurrent_writes allow the coordinator to overlap these I/O operations.
Here’s how it works internally. When a coordinator receives a read request:
- It determines which replicas hold the requested data.
- It sends read requests to a subset of these replicas (based on consistency level and topology).
- It immediately returns to its event loop, ready to process the next incoming application request.
- It can then initiate read requests for this second application request.
- It keeps track of responses from replicas for all in-flight application requests, up to the
concurrent_readslimit. - Once enough replicas respond for a given application request (e.g.,
QUORUM), it aggregates the data and returns it to the client.
The same logic applies to concurrent_writes, but for mutations. The coordinator doesn’t wait for Write acknowledgments from all replicas before it can process the next incoming write request from your application.
The default values are often 32 for both concurrent_reads and concurrent_writes in cassandra.yaml. These are generally good starting points.
Tuning concurrent_reads and concurrent_writes:
- When to increase: If your application experiences read or write timeouts, and
nodetool tpstatsshows high numbers for theReadStageorMutationStage(specifically, the "Total blocked" or "Max blocked" counts), and the coordinator node’s CPU is not saturated, you might benefit from increasing these values. This indicates the coordinator is waiting for I/O to complete before it can start new I/O operations for other requests. - When to decrease: If your coordinator nodes are experiencing high CPU utilization, particularly on the threads handling these stages, or if you see a lot of
OutOfMemoryErrorrelated to thread pools, you might need to decrease these values. This means the coordinator is overwhelmed with managing too many in-flight operations. - Diagnosis: The primary tool is
nodetool tpstats. Look at theReadStageandMutationStage(for writes). Pay attention to the "Pending", "Active", "Completed", and "Blocked" counts. If "Blocked" is consistently high, or if "Pending" and "Active" are filling up and staying there, it’s a sign. Also, check the coordinator node’s system logs for anyOutOfMemoryErroror GC pauses. - The Fix (Increase):
Edit
cassandra.yamlon each node and increaseconcurrent_readsandconcurrent_writes. For example, to increase to64:
Then restart the node. This allows the coordinator to manage more I/O operations concurrently, potentially reducing latency and timeouts if the bottleneck was the coordinator’s ability to overlap I/O.concurrent_reads: 64 concurrent_writes: 64 - The Fix (Decrease):
Edit
cassandra.yamlon each node and decreaseconcurrent_readsandconcurrent_writes. For example, to decrease to16:
Then restart the node. This reduces the number of concurrent I/O operations the coordinator must manage, lowering CPU and memory pressure if the coordinator was overwhelmed.concurrent_reads: 16 concurrent_writes: 16
A common mistake is to confuse these settings with the number of requests your application can send. concurrent_reads and concurrent_writes are internal to the coordinator node, dictating how many in-flight I/O operations it can maintain for each request type. Increasing them does not magically make your application send more requests; it allows the coordinator to handle the existing load more efficiently by overlapping I/O, provided the node has enough CPU and memory to manage the increased concurrency.
The next thing you’ll likely run into when tuning these parameters is understanding how they interact with read_request_timeout_in_ms and write_request_timeout_in_ms.