The most surprising thing about tuning Cassandra for production is that the default cassandra.yaml settings are actively detrimental to performance under load.

Let’s look at a real-world setup. Imagine a 10-node cluster running on i3.xlarge AWS instances, with 16 vCPUs and 30.5 GiB RAM each. The data is primarily time-series, with writes happening every second and reads occurring in 5-minute intervals.

Here’s a snippet of a production-tuned cassandra.yaml and why each change matters:

# General
num_tokens: 256 # Default is 1, too few tokens mean imbalanced load. 256 is a common sweet spot.
concurrent_reads_multiplier: 2 # Default is 1. Allows more read threads to avoid blocking on slow disks or network.
concurrent_writes_multiplier: 2 # Default is 1. Similar to reads, prevents write contention.

# Memtables
memtable_flush_writers: 4 # Default is 2. More writers can parallelize flushing to disk, reducing memtable size build-up.
memtable_heap_space_in_mb: 1024 # Default is often too small. Larger memtables reduce flush frequency, but increase read latency during flushes. 1024MB is a good balance for 30GB RAM.
memtable_offheap_space_in_mb: 2048 # Default is often 0. Off-heap memtables avoid garbage collection pressure on the JVM.

# Compaction
compaction_throughput_mb_per_sec: 16 # Default is 16, but can be increased. Controls how aggressively compactions run. Too low, and SSTables pile up. Too high, and compactions hog disk I/O. 16 is a safe starting point.
min_compaction_threshold: 4 # Default is 4. Smaller threshold means compactions start sooner, keeping SSTable count lower.
max_compaction_threshold: 32 # Default is 32. Higher threshold means compactions run less often, potentially increasing read latency but reducing I/O.

# Caching
key_cache_size_in_mb: 256 # Default is often 0. Caches partition keys for faster reads. 256MB is a reasonable start.
row_cache_size_in_mb: 0 # Default is often 0. Row cache can be very effective for frequently read, small rows, but has high memory overhead. Often disabled in favor of key cache and application-level caching.

# Network
rpc_max_threads: 128 # Default is 100. Increases the pool for handling client requests.
rpc_keepalive: true # Default is false. Keeps connections open longer, reducing overhead for repeated client connections.

Let’s break down what’s happening under the hood. Cassandra uses memtables in memory to buffer writes. When a memtable is full, it’s flushed to disk as an SSTable. Compaction is the process of merging these SSTables to reduce the number of files and improve read performance.

The num_tokens setting is crucial for distributing data evenly across the cluster. Each node is responsible for a range of token values. With num_tokens: 1, a single node could be responsible for a massive chunk of the token ring, leading to uneven load. Increasing this to 256 (a common value) means each node has 256 virtual nodes, providing a much finer-grained distribution of data and load.

memtable_flush_writers directly impacts how quickly Cassandra can get data from memory to disk. When writes are heavy, the memtable can grow large, and a single flush writer can become a bottleneck. Increasing this to 4 allows multiple threads to concurrently flush memtables, preventing writes from backing up.

memtable_heap_space_in_mb and memtable_offheap_space_in_mb control how much memory is dedicated to memtables. Larger memtables mean fewer flushes, which can be good, but also mean larger SSTables being created and potentially longer read latency during flushes. The split between heap and off-heap is important: off-heap memory avoids garbage collection pauses on the JVM, which can be a significant source of latency. Allocating a good portion to off-heap (2048MB in this example) is a common best practice.

Compaction strategy is a constant balancing act. compaction_throughput_mb_per_sec throttles the rate at which compactions can read and write data. If this is too low, you’ll accumulate too many SSTables, leading to poor read performance. If it’s too high, compactions can saturate disk I/O, impacting write and read latency. min_compaction_threshold and max_compaction_threshold influence when compactions start and how many SSTables are merged at once. Lowering min_compaction_threshold means compactions start sooner, keeping the SSTable count lower.

Caching (key_cache_size_in_mb and row_cache_size_in_mb) can significantly speed up reads by keeping frequently accessed data in memory. The key cache stores partition key lookups, while the row cache stores entire rows. Row cache has a much higher memory footprint and can be tricky to tune, so it’s often disabled or used very selectively. Key cache is generally safer and more effective for many workloads.

The concurrent_reads_multiplier and concurrent_writes_multiplier are direct controls over the thread pools that handle read and write requests. Increasing these allows Cassandra to process more requests in parallel, which can be vital for throughput, but you need to ensure your underlying hardware (CPU, disk I/O, network) can keep up.

One of the most subtle yet impactful tuning parameters is commitlog_segment_size_in_mb. While not shown in the snippet above, the default segment size is often 32MB. For high-throughput write workloads, increasing this to 256MB or even 512MB can dramatically improve write performance. The commit log is where writes are durably stored before being written to memtables. Larger segments mean fewer commit log sync operations and less overhead per write. However, very large segments can also increase recovery time after a node restart, as Cassandra has to replay a larger log.

After applying these changes, the next immediate problem you’ll likely face is understanding the impact of compaction on read latency when you introduce a new node or experience a sudden surge in writes.

Want structured learning?

Take the full Cassandra course →