Cassandra doesn’t actually write your data to disk when you think it does, it’s actually a lot smarter and more complex than that.

Let’s watch a write happen. Imagine you’re writing a record to a Cassandra table.

// This is a simplified representation of a Cassandra write operation
Session session = CassandraClient.getSession(); // Assume session is already established
String query = "INSERT INTO users (user_id, name, email) VALUES (?, ?, ?)";
PreparedStatement preparedStatement = session.prepare(query);
BoundStatement boundStatement = preparedStatement.bind(123, "Alice", "alice@example.com");
session.execute(boundStatement);

When that session.execute(boundStatement) line runs, Cassandra doesn’t immediately go looking for the right place on disk to put Alice’s email. Instead, it does two things, almost simultaneously:

  1. Writes to the Commit Log: A single, append-only file on disk (commitlog/CommitLog-xxxxxxxxxxxx.log). This log is a safety net. If Cassandra crashes after acknowledging your write but before it’s fully persisted elsewhere, the commit log is how it recovers. It’s designed for sequential writes, which are super fast.
  2. Writes to the Memtable: An in-memory data structure (like a sorted map). This is where the data conceptually lives once it’s "written." Multiple Memtables can exist for a single table, each representing a distinct set of data that hasn’t been flushed to disk yet.

Think of the CommitLog as a scratchpad that guarantees you won’t lose the write, and the Memtable as a temporary holding pen where the data is organized for quick reads.

Now, these Memtables can’t grow forever. When a Memtable reaches a certain size threshold (configured by memtable_heap_space_in_mb and memtable_offheap_space_in_mb in cassandra.yaml), or after a certain number of entries (memtable_flush_writers), or after a timeout, Cassandra initiates a "flush."

A flush is where the data in the Memtable is actually written to disk, but not as a single, monolithic file. It’s written into sorted files called SSTables (Sorted String Tables). During a flush:

  • The Memtable is copied.
  • The data from the copied Memtable is sorted and written to a new SSTable file in the data/<keyspace>/<table>/<...> directory.
  • A Bloom filter, index file, and other metadata are also created for this SSTable.
  • Once the SSTable is safely written to disk, the original Memtable is discarded.
  • Crucially, the CommitLog entries corresponding to the flushed data are also marked as complete and can eventually be deleted (once they are no longer needed for recovery).

The magic of Cassandra’s read performance comes from the fact that it can query data across multiple Memtables (if any are still active) and multiple SSTables for a given table. When you read data, Cassandra checks the Memtables first. If it’s not there, it consults the Bloom filters for relevant SSTables to quickly determine which SSTables might contain the data, then reads from those SSTables. Because SSTables are sorted, reading is efficient.

The most surprising thing about this process is how Cassandra handles concurrent writes and reads during a flush. A flush doesn’t block incoming writes entirely. New writes will go to a new Memtable, and reads will consult the old Memtable (being flushed) and any existing SSTables. This is a key part of why Cassandra can maintain high write throughput even as data is being persisted. The system is designed to keep accepting writes while background processes write data out.

The next concept you’ll encounter is how Cassandra merges these SSTables over time to keep read performance optimal and reclaim disk space.

Want structured learning?

Take the full Cassandra course →