Cassandra’s distributed nature means data can get out of sync between nodes, and nodetool repair is the primary tool for fixing those inconsistencies.

The core issue is that Cassandra’s eventual consistency model means writes are acknowledged as soon as they’re written to a quorum of nodes, not necessarily all replicas. If a node is down or network issues arise, other nodes might receive updates that the down node misses. This leads to different replicas holding different versions of the same data. nodetool repair is designed to reconcile these differences by comparing data across replicas and streaming the missing or newer versions to nodes that are behind.

Here are the common causes of data inconsistencies and how to fix them:

Node Downtime

Diagnosis: The most common reason for inconsistencies is a node being offline for an extended period. While it’s down, other nodes continue to accept writes, and the offline node misses these updates.

Check: You can see if nodes have been down recently by checking the system logs of your Cassandra cluster or by using monitoring tools. A simple check is to look at the nodetool status output. If a node is marked as "DN" (Down), it’s a prime suspect.

Fix: The primary fix is to bring the node back online and then run nodetool repair on that node.

nodetool repair <keyspace_name> <table_name>

Why it works: Once the node is back online, nodetool repair will compare its data with other replicas. It identifies which data is missing or outdated on the repaired node and streams it from other nodes that have the newer version.

Network Partitions

Diagnosis: Network issues can cause parts of your cluster to become temporarily isolated from others. During a partition, nodes in different segments of the partition can’t communicate, leading to divergent data.

Check: nodetool netstats can sometimes show active connections and dropped messages, indicating network problems. You might also see alerts from your network monitoring systems. Check Cassandra system logs for ERROR messages related to connection timeouts or STREAM failures.

Fix: Resolve the underlying network connectivity issues first. Once the network is stable, run nodetool repair on the affected nodes.

nodetool repair --full <keyspace_name> <table_name>

Why it works: The --full flag forces a repair that reads all data in the specified range, not just based on timestamps. This is more thorough for recovering from network partitions where data might have diverged significantly without a clear "newer" version based solely on timestamps.

Disk Failures or Corruption

Diagnosis: If a disk holding Cassandra data files on a node fails or becomes corrupted, that node will have incomplete or incorrect data.

Check: System logs (dmesg, /var/log/syslog) will often show disk errors. nodetool info might also reveal issues if it can’t access certain data directories. Running fsck on the relevant partitions can confirm disk-level corruption.

Fix: This is a more severe issue. Ideally, you’d replace the failed disk and restore the node from a backup. If the corruption is localized and you’re confident other nodes are healthy, you might attempt a repair.

nodetool repair -pr <keyspace_name>

Why it works: The -pr (partitioner range) option repairs all data for the node. If the node is missing data due to disk issues, this repair will attempt to pull that data from other nodes that have it. However, if the underlying data files are corrupted on disk, simply running repair might not fix it; it will only attempt to stream data that should be there. Restoration from backup is often the safest bet.

Inefficient Compaction Strategy

Diagnosis: If your compaction strategy isn’t keeping up with the write load, older SSTables might not be compacted frequently enough. This can lead to larger SSTables and slower reads, and in some edge cases, can contribute to inconsistencies if repair operations are also delayed.

Check: Monitor your cluster’s compaction statistics using nodetool compactionstats. If the pending tasks count is consistently high or growing, your compactions are falling behind.

Fix: Adjust your compaction strategy or perform manual compactions.

nodetool compact <keyspace_name> <table_name>

Why it works: Manual compact forces Cassandra to merge SSTables, creating new, consolidated SSTables. This ensures that data is rewritten and potentially resolves minor inconsistencies that might arise from outdated SSTables not being properly merged. It also improves read performance by reducing the number of SSTables a read needs to check.

Large Data Volumes and Long-Running Repairs

Diagnosis: For very large tables or clusters, a full nodetool repair can take days or even weeks. If repairs are not completed before new data is written, or if nodes go down during a repair, inconsistencies can persist.

Check: Monitor the progress of your nodetool repair jobs. If they are consistently not finishing within a reasonable timeframe (e.g., before the next scheduled repair interval, typically 7 days), you have a problem.

Fix: Implement incremental repairs and schedule them more frequently.

nodetool repair -inc <keyspace_name> <table_name>

Why it works: Incremental repairs only scan and stream data for partitions that have changed since the last repair. This dramatically reduces the time and resources required for repair, making it feasible to run them much more frequently (e.g., daily or even hourly), thus keeping data more consistently synchronized across the cluster.

Clock Skew Between Nodes

Diagnosis: Cassandra relies on the timestamp of data to determine which version is newer. If the clocks on your nodes are significantly out of sync, Cassandra might incorrectly identify older data as newer, leading to data loss or inconsistencies.

Check: Use ntpdate -q <your_ntp_server> or timedatectl status on each node to check for clock drift. A difference of more than a few seconds can be problematic.

Fix: Configure all nodes in your cluster to synchronize their clocks with a reliable NTP (Network Time Protocol) server.

# Example for systemd-based systems
sudo timedatectl set-ntp true
sudo systemctl restart systemd-timesyncd

Why it works: Ensuring all nodes have synchronized clocks guarantees that the timestamps Cassandra uses for conflict resolution are accurate and consistent across the cluster, preventing incorrect data overwrites.

After resolving these, the next common error you might encounter is related to schema disagreements if schema changes were applied during periods of node unavailability.

Want structured learning?

Take the full Cassandra course →