Cassandra’s consistency levels are less about guaranteeing data availability and more about controlling the trade-off between read latency and the likelihood of reading stale data.
Let’s see it in action. Imagine a simple two-node cluster with replication factor 3.
nodetool status
Output:
Datacenter: datacenter1
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Decommissioned
– Action=Load Balancing
ID Rack Status State Address Load Tokens Owns (effective) Host ID
1/1 192.168.1.10 rack1 Up Normal 192.168.1.10 100 GB 256 100.00% a1b2c3d4-e5f6-7890-1234-567890abcdef
2/2 192.168.1.11 rack1 Up Normal 192.168.1.11 100 GB 256 100.00% b2c3d4e5-f6a7-8901-2345-67890abcdef1
3/3 192.168.1.12 rack1 Up Normal 192.168.1.12 100 GB 256 100.00% c3d4e5f6-a7b8-9012-3456-7890abcdef12
We’ll create a keyspace with replication = {'class': 'SimpleStrategy', 'replication_factor': 3}.
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
USE my_keyspace;
CREATE TABLE my_table (id UUID PRIMARY KEY, value text);
Now, let’s write a value using ONE consistency level:
INSERT INTO my_table (id, value) VALUES (uuid(), 'hello');
This write only needs to succeed on one replica. The coordinator node sends the write to one replica and immediately returns success to the client. The other replicas might not have received it yet.
Next, let’s read that value using QUORUM:
SELECT * FROM my_table WHERE id = <the_uuid_you_just_inserted>;
For QUORUM, the coordinator needs to hear back from a majority of replicas. With a replication factor of 3, a quorum is 2. If the replica that received the ONE write is alive, and another replica also happens to have the data (perhaps from a previous write), the read will succeed and return 'hello'. However, if the replica that received the ONE write is down, or if the other two replicas haven’t yet received the data, the read will fail.
If we used ALL for the read:
SELECT * FROM my_table USING CONSISTENCY ALL WHERE id = <the_uuid_you_just_inserted>;
The coordinator would need to hear back from all 3 replicas. If even one replica is down or hasn’t received the write, the read fails. This guarantees you won’t read stale data, but it’s the slowest and least available option.
The core problem Cassandra solves is distributed data storage and retrieval at massive scale, but it has to make compromises. Traditional relational databases often prioritize strong consistency (ACID properties), meaning every read sees the absolute latest committed write. Cassandra, designed for high availability and partition tolerance (AP in CAP theorem terms), allows for eventual consistency. Consistency levels are how you tune this trade-off. ONE offers low latency and high availability but risks reading stale data. ALL offers the highest consistency but sacrifices latency and availability. QUORUM (and LOCAL_QUORUM) sits in the middle, requiring a majority of replicas to respond, providing a balance. The key is that write_consistency + read_consistency > replication_factor is required for strong consistency. For example, with RF=3, QUORUM (2) + QUORUM (2) = 4, which is > 3. This ensures that at least one replica involved in the read must have been involved in the write, thus seeing the latest data.
The actual number of replicas a coordinator waits for with QUORUM is (replication_factor / 2) + 1. So for RF=3, it’s (3/2) + 1 = 1.5 + 1 = 2.5, which rounds up to 3. Wait, that’s wrong. It’s floor(replication_factor / 2) + 1. So for RF=3, it’s floor(1.5) + 1 = 1 + 1 = 2. For RF=5, it’s floor(2.5) + 1 = 2 + 1 = 3. This is the number of replicas the coordinator must hear from.
If you have a multi-datacenter cluster and use QUORUM, it applies globally. LOCAL_QUORUM is often preferred as it requires a quorum within the coordinator’s local datacenter, improving latency and availability if other datacenters are unreachable.
The next concept to grapple with is how Cassandra handles tombstones and their impact on read performance.