Cassandra’s lightweight transactions, powered by Paxos, are surprisingly more about consistency guarantees than traditional ACID transactions.
Let’s see it in action. Imagine we have a users table:
CREATE TABLE users (
user_id uuid PRIMARY KEY,
username text,
email text,
created_at timestamp
);
We want to ensure that when we insert a new user, we don’t accidentally overwrite an existing user with the same user_id or, more subtly, that we only insert if a user with that user_id doesn’t already exist. This is where IF NOT EXISTS shines.
Consider this statement:
INSERT INTO users (user_id, username, email, created_at)
VALUES (uuid(), 'alice', 'alice@example.com', toTimestamp(now()))
IF NOT EXISTS;
If a row with that user_id already exists, this INSERT will simply fail to insert, and Cassandra will return a result indicating that no rows were affected, without raising an error. If the user_id is new, the row is inserted.
Now, what if we want to update a user’s email, but only if that user actually exists?
UPDATE users
SET email = 'alice.updated@example.com'
WHERE user_id = aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
IF EXISTS;
This UPDATE will only proceed if a row with the specified user_id is found. If it’s not found, the UPDATE is a no-op, and Cassandra reports that zero rows were updated.
The real power comes when you combine these. Let’s say we want to insert a user, but only if a user with that user_id doesn’t exist, and we want to confirm that our insert was the one that succeeded. This is the classic "create if not exists" pattern.
INSERT INTO users (user_id, username, email, created_at)
VALUES (aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee, 'bob', 'bob@example.com', toTimestamp(now()))
IF NOT EXISTS;
If this statement returns success (meaning a row was inserted), you know you’ve successfully created the user. If it returns zero rows affected, another client already inserted a user with that user_id.
Internally, IF NOT EXISTS and IF EXISTS trigger Cassandra’s lightweight transaction mechanism, which is based on Paxos. When you issue such a statement, Cassandra coordinates the operation across a quorum of replicas for the affected partition. For IF NOT EXISTS, Paxos attempts to "propose" the new row. If the proposed user_id already exists, the Paxos round will detect this and the operation will fail. For IF EXISTS, Paxos first checks for the existence of the row. If it finds it, it proceeds with the update; otherwise, it aborts. This coordination ensures that the condition (existence or non-existence) is checked and acted upon atomically across the quorum, preventing race conditions.
The key difference from traditional ACID transactions is that lightweight transactions are expensive. They involve multiple network round trips and consensus protocols (Paxos) to achieve their conditional guarantees. This makes them unsuitable for high-volume, frequent operations. They are best reserved for operations where the consistency guarantee of "only if this condition is met" is critical, such as initial data loading, idempotency guarantees for critical writes, or preventing duplicate primary keys in specific scenarios.
A common pitfall is using IF EXISTS or IF NOT EXISTS on operations that don’t involve a primary key or a unique index. Lightweight transactions must have a condition that can be resolved atomically across a quorum. If your WHERE clause doesn’t uniquely identify a row (or a potential row for IF NOT EXISTS), Paxos cannot reliably determine the state and the operation will likely fail or behave unexpectedly.
The next logical step after mastering conditional writes is understanding how Cassandra handles schema changes and their propagation, and the implications of eventual consistency in non-transactional operations.