CockroachDB can achieve rolling upgrades without downtime because its distributed nature means no single node is critical for cluster availability.

Here’s a look at a rolling upgrade in action, using a hypothetical cluster with three nodes: crdb-1, crdb-2, and crdb-3.

# Start with an older version (e.g., v22.1.0)
# Assume initial cluster is healthy and running
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node1 --http-port=8080 --listen-addr=crdb-1:26257 --advertise-addr=crdb-1:26257
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node2 --http-port=8081 --listen-addr=crdb-2:26257 --advertise-addr=crdb-2:26257
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node3 --http-port=8082 --listen-addr=crdb-3:26257 --advertise-addr=crdb-3:26257

# Verify cluster health
cockroach node status --insecure --host=crdb-1:26257

Now, let’s upgrade crdb-1 to a newer version (e.g., v22.2.0).

# Stop the first node
cockroach stop --insecure --host=crdb-1:26257

# Replace the binary with the new version (e.g., copy v22.2.0 binary to /usr/local/bin/cockroach)

# Restart the first node with the new binary
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node1 --http-port=8080 --listen-addr=crdb-1:26257 --advertise-addr=crdb-1:26257

# Check the version of the upgraded node
cockroach version --insecure --host=crdb-1:26257

You’ll notice that crdb-2 and crdb-3 are still running the old version. The cluster continues to operate because CockroachDB’s replication ensures data is available across multiple nodes. The upgraded node will rejoin the cluster, and its data will be reconciled with the rest of the cluster.

Next, repeat the process for crdb-2:

# Stop the second node
cockroach stop --insecure --host=crdb-2:26257

# Replace the binary with the new version

# Restart the second node
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node2 --http-port=8081 --listen-addr=crdb-2:26257 --advertise-addr=crdb-2:26257

# Check the version
cockroach version --insecure --host=crdb-2:26257

Finally, upgrade crdb-3:

# Stop the third node
cockroach stop --insecure --host=crdb-3:26257

# Replace the binary with the new version

# Restart the third node
cockroach start --insecure --join=crdb-1:26257,crdb-2:26257,crdb-3:26257 --store=node3 --http-port=8082 --listen-addr=crdb-3:26257 --advertise-addr=crdb-3:26257

# Verify all nodes are on the new version
cockroach node status --insecure --host=crdb-1:26257

The core problem this process solves is the "all-or-nothing" upgrade that plagues traditional single-instance databases. By upgrading one node at a time, you leverage CockroachDB’s fault tolerance. Each node is a peer, and data is sharded and replicated across these peers. When a node is restarted with a new binary, it rejoins the cluster, and the cluster’s distributed consensus mechanisms (like Raft) ensure that all nodes eventually agree on the new schema and behavior. The critical part is that data availability is maintained throughout this process because a quorum of nodes always remains operational and able to serve reads and writes.

The most surprising thing about CockroachDB’s rolling upgrade is that it doesn’t require a separate orchestration layer for the upgrade itself; the cluster manages the process by continually checking the versions of its peers. If a majority of nodes are on a compatible version, the cluster will continue to function. The new binary on a restarted node will detect this and begin the process of version reconciliation with its peers.

The next concept to explore is schema changes during or immediately after a rolling upgrade, as this can introduce subtle compatibility issues if not handled carefully.

Want structured learning?

Take the full Cockroachdb course →