Cassandra’s disk I/O bottleneck means the database can’t read or write data from/to its storage fast enough, leading to slow queries and write failures.

Common Causes and Fixes

1. Disk Saturation (High IOPS/Throughput)

  • Diagnosis: Monitor your disk’s IOPS (Input/Output Operations Per Second) and throughput. On Linux, use iostat -xz 5 and look at %util, r/s, w/s, rkB/s, wkB/s. If %util is consistently near 100%, you’re saturated.
  • Cause: The underlying physical or virtual disks can’t keep up with Cassandra’s demands. This is especially common with spinning disks or overloaded cloud instance storage.
  • Fix:
    • Upgrade Disk Type: Move to faster storage. For instance, if using HDDs, switch to SSDs. If using standard SSDs, consider NVMe SSDs for significantly higher IOPS and lower latency.
    • Add More Disks: Distribute data across more physical disks. Cassandra’s data directory can span multiple mount points. Ensure each disk is on a separate controller if possible.
    • Cloud Instance Storage: Choose instance types with dedicated or provisioned IOPS storage (e.g., AWS gp3/io1, GCP pd-ssd/nvme-ssd) and provision sufficient IOPS and throughput for your workload.
  • Why it works: Faster disks or more disks increase the aggregate IOPS and throughput available to Cassandra, allowing it to service requests more quickly.

2. Incorrect Disk Configuration (e.g., RAID Level)

  • Diagnosis: Check your RAID configuration. For Cassandra, RAID 0 is often recommended for performance, but it offers no redundancy. RAID 10 provides a balance of performance and redundancy. Avoid RAID 5/6 due to their write penalty.
  • Cause: Using RAID levels with high write penalties (like RAID 5 or 6) can severely impact Cassandra’s write performance, as every write operation requires multiple reads and writes to parity blocks.
  • Fix:
    • Reconfigure RAID: If using RAID 5/6, plan to migrate data to a RAID 0 or RAID 10 array. This often involves backing up data, destroying the old array, creating the new one, and restoring data.
    • Use Single Disks (No RAID): For maximum performance and simplicity, especially with SSDs, consider using individual disks mounted directly, with Cassandra’s replication providing the fault tolerance.
  • Why it works: RAID 0 and RAID 10 have lower write penalties than RAID 5/6, allowing Cassandra’s writes (which are often sequential appends to commit logs and SSTables) to proceed with less overhead.

3. Frequent Compactions Leading to Disk Thrashing

  • Diagnosis: Monitor nodetool compactionstats. High numbers of running compactions, a large queue, and sustained high disk utilization during compactions indicate a problem. Look for SSTable count per table and average SSTable size.
  • Cause: Cassandra needs to merge smaller SSTables into larger ones to reclaim space and improve read performance. If compactions are happening too frequently or are too aggressive, they can consume all available disk I/O, starving read/write requests. This often happens with very high write loads or small, frequently updated records.
  • Fix:
    • Tune compaction_throughput_mb_per_sec: In cassandra.yaml, set compaction_throughput_mb_per_sec to a value that leaves enough I/O for client requests. Start with a conservative value like 16 or 32 and increase gradually while monitoring disk utilization and latency.
    • Choose Appropriate Compaction Strategy: Ensure you’re using the right compaction strategy for your workload. SizeTieredCompactionStrategy (STCS) is common but can lead to many SSTables. LeveledCompactionStrategy (LCS) is better for high write workloads with fewer SSTables but has a higher I/O cost. TimeWindowCompactionStrategy (TWCS) is ideal for time-series data.
    • Increase SSTable Count Threshold (STCS): For STCS, you can adjust sstable_size_in_mb and min_sstable_size_in_mb to influence when compactions trigger. Larger SSTables mean fewer compactions.
  • Why it works: Limiting compaction throughput ensures that compactions don’t consume all disk resources. Selecting the right strategy and tuning thresholds balances the need for compaction with ongoing client request performance.

4. Inefficient Data Modeling (High Read/Write Amplification)

  • Diagnosis: Analyze your data model and query patterns. Frequent use of ALLOW FILTERING, large partitions, or wide rows can lead to read amplification. High write volumes without proper batching can lead to write amplification on disk. Use nodetool cfstats to check SSTable count and average row/partition size.
  • Cause: A data model that doesn’t align with query patterns forces Cassandra to read more data than necessary (read amplification) or write redundant data (write amplification). This translates directly to more disk I/O.
  • Fix:
    • Denormalize Data: Create tables specifically for your queries. Avoid ALLOW FILTERING by designing query-specific tables.
    • Manage Partition Size: Keep partitions reasonably sized (e.g., under 100MB). If a logical entity exceeds this, consider partitioning by a secondary key or clustering key.
    • Batch Writes Appropriately: Use UNLOGGED batches for related writes to improve efficiency, but avoid excessively large batches that can cause timeouts.
  • Why it works: An optimized data model reduces the amount of data Cassandra needs to read from disk for a given query and ensures writes are as efficient as possible, thereby lowering overall disk I/O.

5. Commit Log Not Flushing Fast Enough

  • Diagnosis: Monitor the commit log disk. High disk utilization on the commit log device, especially during write spikes, can indicate a bottleneck. Check the CommitLog queue size in Cassandra JMX metrics.
  • Cause: Cassandra writes all incoming mutations to a commit log sequentially before writing to an SSTable. If the commit log disk is slow, or if there’s a massive spike in writes, the commit log can become a bottleneck.
  • Fix:
    • Separate Commit Log Device: Move the commit log directory (commitlog_directory in cassandra.yaml) to a fast, dedicated disk (ideally an SSD or NVMe).
    • Increase commitlog_sync_period_in_ms (with caution): Increasing this value (e.g., from 10000 to 60000) reduces the frequency of commit log syncs, which can help if syncs are causing I/O contention. However, this increases the potential data loss window in a crash scenario.
  • Why it works: A faster commit log device or less frequent syncing allows Cassandra to write mutations to disk more quickly, preventing it from falling behind on writes.

6. Insufficiently Sized JVM Heap

  • Diagnosis: Monitor JVM heap usage. High GC activity (frequent, long pauses) and low heap availability can indirectly cause disk I/O issues. Use nodetool gcstats and JMX metrics.
  • Cause: While not a direct disk I/O component, a severely undersized heap leads to excessive garbage collection. Frequent and long GC pauses can cause Cassandra to briefly stall or slow down processing, which can indirectly lead to I/O queuing and increased latency if the system struggles to catch up.
  • Fix:
    • Increase JVM Heap Size: In jvm-server.options or jvm-server-heap.options, increase Xmx and Xms to appropriate values. For modern Cassandra, 16GB-32GB is common, but never exceed 50% of system RAM, and stay below 32GB if possible to avoid G1GC issues with compressed oops.
    • Tune GC Settings: Consider using G1GC (default in recent versions) and tune its parameters for your specific workload if GC pauses are still an issue.
  • Why it works: A sufficiently sized heap reduces GC frequency and duration, allowing Cassandra to process data more smoothly and consistently, which in turn helps manage disk I/O more effectively.

After fixing these, your next likely issue will be network latency between nodes, especially during repairs or cross-datacenter replication.

Want structured learning?

Take the full Cassandra course →