Adding nodes to a Cassandra cluster without downtime is surprisingly straightforward, but the reason it works relies on a fundamental misunderstanding of how Cassandra achieves consistency.

Here’s a look at a live cluster and how we’ll add a node.

Imagine we have a 3-node cluster: node1, node2, node3. Each node is running Cassandra 4.1. We’ll add node4.

# On node1 (or any existing node)
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.1.10  100.00 MiB  256     100.0%            a1b2c3d4-e5f6-7890-1234-567890abcdef  rack1
UN  10.0.1.11  100.00 MiB  256     100.0%            b2c3d4e5-f6a7-8901-2345-67890abcdef1  rack1
UN  10.0.1.12  100.00 MiB  256     100.0%            c3d4e5f6-a7b8-9012-3456-7890abcdef12  rack1

This shows our healthy 3-node cluster. Now, let’s set up node4 with the same Cassandra version and configuration as the existing nodes (e.g., same cassandra.yaml, same cassandra-rackdc.properties). The most crucial part is ensuring node4 can communicate with the other nodes and has the correct cluster_name and seeds defined in its cassandra.yaml.

For node4, cassandra.yaml would have:

cluster_name: 'MyCassandraCluster'
seeds: "10.0.1.10,10.0.1.11" # Point to existing seeds

Once node4 is configured and its Cassandra service is started, it will initially appear as "Joining" in nodetool status.

# On node1 (or any existing node)
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.1.10  100.00 MiB  256     100.0%            a1b2c3d4-e5f6-7890-1234-567890abcdef  rack1
UN  10.0.1.11  100.00 MiB  256     100.0%            b2c3d4e5-f6a7-8901-2345-67890abcdef1  rack1
UN  10.0.1.12  100.00 MiB  256     100.0%            c3d4e5f6-a7b8-9012-3456-7890abcdef12  rack1
UJ  10.0.1.13  0.00 B      256     0.0%              d4e5f6a7-b8c9-0123-4567-890abcdef123  rack1

Notice node4 (10.0.1.13) is marked as UJ (Up/Joining). Cassandra’s gossip protocol informs existing nodes about node4. The key to zero downtime is that node4 doesn’t immediately own any data. Instead, it signals its intent to join.

Cassandra’s distributed nature means data is replicated across multiple nodes. When node4 joins, it initiates a process called bootstrapping. It contacts its peers and requests copies of the data that now falls within its token range. This data transfer happens in the background.

During bootstrapping, reads and writes continue to flow to the existing nodes. For any given piece of data, if the coordinator node (the one receiving the client request) knows that node4 is now part of the cluster and is responsible for some of that data’s token range, it will include node4 in its consistency-aware request. If node4 hasn’t finished streaming the data yet, the read request might fail if the consistency level requires it, but writes are generally safe because the data is still available on the original nodes.

Once node4 has successfully streamed its assigned data from its peers, it transitions from "Joining" to "Normal."

# On node1 (or any existing node)
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.1.10  150.00 MiB  256     66.7%             a1b2c3d4-e5f6-7890-1234-567890abcdef  rack1
UN  10.0.1.11  150.00 MiB  256     66.7%             b2c3d4e5-f6a7-8901-2345-67890abcdef1  rack1
UN  10.0.1.12  150.00 MiB  256     66.7%             c3d4e5f6-a7b8-9012-3456-7890abcdef12  rack1
UN  10.0.1.13  150.00 MiB  256     33.3%             d4e5f6a7-b8c9-0123-4567-890abcdef123  rack1

Notice how the "Owns (effective)" percentage has shifted. Each node now owns roughly 33.3% of the token range, and the load has increased on the original nodes as they transferred data. This indicates that node4 is fully participating and serving requests for its assigned data.

The critical insight is that Cassandra doesn’t rebalance data immediately when a node joins. It streams copies of the data to the new node. The "owns" percentage is a reflection of the token distribution, not a statement about where the data physically resides until the streaming is complete. Until then, the old nodes still hold their copies, ensuring availability.

The real work is done by the streaming subsystem. When node4 joins, it tells the existing nodes, "Hey, I’m going to own tokens X through Y. Do you have any data for those tokens? Please send it to me." The existing nodes then initiate streaming sessions to node4. This is a peer-to-peer transfer, not a central coordination event.

After this, you’ll likely see a spike in disk I/O and network traffic on node4 and its peers as data is transferred.

The next challenge you’ll face is scaling your write capacity as your cluster grows, which might involve tuning commit log segment sizes or examining compaction strategies.

Want structured learning?

Take the full Cassandra course →