Cassandra’s rolling upgrade process is designed to let you update your cluster node by node, minimizing or eliminating downtime.
Let’s watch it happen. Imagine we have a small, three-node Cassandra cluster running version 3.11.2. Our goal is to upgrade it to 4.0.1.
Here’s how a typical rolling upgrade looks, focusing on the commands and the underlying mechanics.
Preparing for the Upgrade
First, ensure your Cassandra version is compatible with the target version. For major version upgrades (like 3.x to 4.x), Cassandra has specific upgrade paths. You usually can’t jump multiple major versions at once. Always check the official Cassandra documentation for the exact upgrade path and any potential caveats for your specific versions.
Crucial Pre-Upgrade Checks:
- Backup: This is non-negotiable. Back up your data using
nodetool snapshotor a similar method. - Node Health: Ensure all nodes are healthy.
You want to seenodetool statusUN(Up/Normal) for all nodes. If any node isDN(Down/Normal) orUL(Up/Leaving), address that before starting the upgrade. - Disk Space: Verify sufficient disk space on all nodes for data, logs, and commit logs.
- Configuration: Review your
cassandra.yamlfor any deprecated or changed settings between versions. Pay close attention to any custom configurations.
The Rolling Upgrade Steps
The core idea is to upgrade one node at a time, ensuring the cluster remains operational throughout the process.
Step 1: Upgrade the First Node
-
Stop Cassandra:
sudo systemctl stop cassandraThis ensures no writes or reads are happening on this node during the upgrade.
-
Upgrade Cassandra Binaries: This varies by installation method.
- Debian/Ubuntu (apt):
sudo apt update sudo apt install cassandra=4.0.1 - RPM (yum/dnf):
sudo yum update cassandra --enablerepo=datastax-cassandra-4.0 # or sudo dnf update cassandra --enablerepo=datastax-cassandra-4.0 - Tarball: Replace the old
cassandradirectory with the new one.
- Debian/Ubuntu (apt):
-
Run
nodetool upgradesstables: This is the critical command that rewrites SSTables to a format compatible with the new Cassandra version.nodetool upgradesstablesThis command can take a long time, depending on the amount of data on the node. It’s a background process, so you can check its progress:
nodetool compactionstatsYou’ll see SSTable upgrades listed here. Wait until this command completes and
nodetool compactionstatsshows no ongoing SSTable upgrades for the new version. -
Start Cassandra:
sudo systemctl start cassandra -
Monitor Node Status:
nodetool statusThe node should rejoin the cluster as
UN. Check logs for any errors.sudo journalctl -u cassandra -f
Step 2: Upgrade Subsequent Nodes
Repeat Step 1 for each remaining node in the cluster, one by one.
- Important: Always wait for the node you just upgraded to fully rejoin the cluster and stabilize (
UNstatus, no errors in logs) before proceeding to the next node.
Step 3: Final Verification
Once all nodes are upgraded and healthy:
-
Check Cluster Status:
nodetool statusAll nodes should be
UNand reporting the new version (e.g., 4.0.1). -
Run
nodetool upgradesstableson ALL nodes: Even though you ran it during the upgrade, it’s good practice to run it on all nodes after the entire cluster is upgraded. This ensures any SSTables that were written during the upgrade process (e.g., from compactions or repairs) are also converted.nodetool upgradesstablesAgain, monitor with
nodetool compactionstatsand wait for it to finish on each node. -
Test Application Connectivity: Ensure your applications can connect and perform read/write operations successfully.
The Mental Model: Why This Works
Cassandra’s architecture is designed for distributed operations. When you stop one node, the remaining nodes continue to serve requests. The gossip protocol ensures nodes are aware of each other’s status. By upgrading one node at a time, you’re essentially taking a single replica out of commission temporarily. The other replicas can still serve data, and during the upgradesstables phase, the node is not actively participating in serving client requests. Once it’s back up, it catches up on any missed writes and rejoins the ring.
The upgradesstables command is the linchpin. It converts the on-disk data files (SSTables) from the old version’s format to the new version’s format. Without this, the node wouldn’t be able to read its own data after starting with the new binaries.
The most surprising thing about Cassandra’s rolling upgrade is how resilient it is to minor configuration drift during the upgrade process. As long as the core schema is compatible and the upgradesstables command runs successfully on each node, the data format will be correctly transitioned. However, this resilience is not a license for laxness; deviations from best practices will eventually catch up to you.
The next challenge you’ll likely encounter is dealing with schema changes that might have been introduced between versions, or optimizing performance on the new version.