Cassandra’s nodetool snapshot command is your go-to for creating point-in-time backups of your data, but it’s not a full system restore solution by itself.
Let’s see it in action. Imagine you have a keyspace named my_keyspace and a table users within it. To snapshot this specific table, you’d run:
nodetool snapshot my_keyspace -t my_snapshot_timestamp
If you want to snapshot the entire keyspace, you omit the table name:
nodetool snapshot -t my_snapshot_timestamp
These commands don’t actually move data. Instead, they create hard links in a snapshots directory within each Cassandra data directory. This means the snapshot shares the same underlying SSTable files as your live data, making it incredibly space-efficient at the moment of creation.
Here’s what’s happening under the hood: when nodetool snapshot is executed, Cassandra first flushes any in-memory data (memtables) to disk, ensuring all current data is written to SSTable files. Then, it creates a new subdirectory within each node’s data directory (e.g., /var/lib/cassandra/data/my_keyspace/users/snapshots/my_snapshot_timestamp). For each SSTable file belonging to the target keyspace/table, a hard link is created in this new snapshot directory. A hard link is a directory entry that points to the same inode (and thus the same data blocks on disk) as the original file. This is why snapshots are nearly instantaneous and don’t consume significant extra disk space immediately.
The primary problem this solves is enabling quick, consistent backups of your Cassandra data across all nodes in a cluster. You can then copy these snapshot directories off the nodes for safekeeping. Restoring involves stopping Cassandra, clearing out the live data directories, and then copying the snapshot files back.
The most surprising true thing about nodetool snapshot is that it doesn’t actually copy any data. It leverages the filesystem’s hard link feature to create a reference to the existing data files. This is brilliant for speed and storage efficiency but has critical implications for how you manage the snapshot lifecycle. Deleting the original data files before you’ve copied the snapshot elsewhere will render your snapshot useless because the hard links will break.
When you execute nodetool snapshot, it generates a timestamped directory for each node. Inside this directory, you’ll find directories for each keyspace and table, containing the SSTable files and associated index files. For example, on a node, you might see:
/var/lib/cassandra/data/my_keyspace/users/snapshots/my_snapshot_timestamp/la-1-big-Data.db
/var/lib/cassandra/data/my_keyspace/users/snapshots/my_snapshot_timestamp/la-1-big-Index.db
…and so on for all SSTables and related files.
The nodetool clear_snapshot command is your counterpart for managing these. To remove a specific snapshot across the cluster:
nodetool clear_snapshot my_keyspace -t my_snapshot_timestamp
Or to clear all snapshots for a keyspace:
nodetool clear_snapshot my_keyspace
And to clear all snapshots on all keyspaces on a node:
nodetool clear_snapshot
The real power comes from coordinating this across your cluster. You’d typically run nodetool snapshot on all nodes simultaneously (or within a very tight window) to ensure a consistent point-in-time view. Then, you’d copy the snapshot directories from each node to a central backup location.
A common misconception is that nodetool snapshot creates an independent copy of your data. It doesn’t. It creates a set of hard links. This means your snapshot is only valid as long as the original SSTable files exist. If the underlying data files are modified or deleted (which happens during compaction or repairs if not managed carefully), your snapshot can become invalid or point to incomplete data. Therefore, the critical post-snapshot step is always to copy the snapshot data off the Cassandra nodes to a separate, safe location before any operations that might alter or remove the original SSTables.
The next logical step after mastering snapshots is understanding how to use them for restoration, which involves more than just copying files back.