You can upgrade ClickHouse clusters without dropping queries by performing a rolling upgrade, where you update nodes one by one, ensuring at least one replica remains available and functional at all times.

Let’s see this in action. Imagine we have a two-replica test_table in a test_db across two ClickHouse servers, ch01 and ch02. Both are currently running version 23.3.10.

On ch01:

-- Connect to ch01
clickhouse-client -h ch01

-- Create a database and table
CREATE DATABASE IF NOT EXISTS test_db;
CREATE TABLE IF NOT EXISTS test_db.test_table (id UInt64, value String) ENGINE = MergeTree() ORDER BY id;

-- Insert some data
INSERT INTO test_table VALUES (1, 'hello'), (2, 'world');

On ch02:

-- Connect to ch02
clickhouse-client -h ch02

-- Create the database and table (assuming it's a replicated engine)
-- For simplicity, we'll assume replication is already set up via ZooKeeper.
-- If not, you'd need to configure ZooKeeper and the REPLICATION_PATH.
-- CREATE TABLE test_db.test_table (...) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test_table', '{replica}') ORDER BY id;

Now, while ch01 is running, we can query it:

-- On ch01
SELECT * FROM test_db.test_table;
-- Output:
-- 1 hello
-- 2 world

And even while ch01 is running, and ch02 is also running the old version, data will replicate.

The core problem rolling upgrades solve is maintaining service availability. Traditional upgrades often require taking the entire cluster offline, which is a non-starter for critical applications. By upgrading nodes incrementally, you minimize the blast radius of any potential issue and ensure that a healthy subset of nodes can continue serving traffic. This strategy relies heavily on ClickHouse’s replication and fault tolerance mechanisms.

Here’s how the process works conceptually:

  1. Isolate a Node: Take a single node out of the load balancer’s rotation and stop its ClickHouse service.
  2. Upgrade the Node: Install the new ClickHouse version on this node.
  3. Start and Verify: Start the upgraded ClickHouse service. Crucially, verify that it can connect to ZooKeeper (if using replicated tables) and that it’s healthy.
  4. Reintroduce to Rotation: Add the upgraded node back into the load balancer’s rotation.
  5. Repeat: Move to the next node and repeat the process.

The key is that during this entire operation, at least one replica remains functional and capable of serving queries. For replicated tables, ClickHouse’s ReplicatedMergeTree engine ensures data consistency even if one replica is temporarily down for upgrades. ZooKeeper plays a vital role in coordinating replica states and ensuring that new nodes can catch up on any missed mutations.

Let’s walk through the practical steps. Assume you have a cluster with ch01, ch02, and ch03, all on version 23.3.10, and you want to upgrade to 24.1.2.

Step 1: Upgrade ch01

First, ensure ch01 is not receiving new queries. If you have a load balancer, remove it from the pool. Then, stop the ClickHouse service:

# On ch01
sudo systemctl stop clickhouse-server

Next, upgrade the package. The exact command depends on your installation method (apt, yum, Docker, etc.). For apt:

# On ch01
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client

After installation, start the service:

# On ch01
sudo systemctl start clickhouse-server

Now, the critical verification step. Check the logs for any errors and, more importantly, verify its replication status. Connect to the upgraded ch01 and run:

-- On ch01
SELECT * FROM system.replicas WHERE database = 'test_db' AND table = 'test_table';

You’ll want to see is_leader and is_readonly as expected, and crucially, queue_size should be manageable or zero if it’s caught up. If it’s a replicated engine, it should be able to communicate with ZooKeeper. You can also check system.zookeeper.

Once verified, add ch01 back to your load balancer.

Step 2: Upgrade ch02

Repeat the process for ch02:

# On ch02
sudo systemctl stop clickhouse-server
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client
sudo systemctl start clickhouse-server
# Verify status and replication as above

Remove ch02 from the load balancer, stop, upgrade, start, verify, and re-add.

Step 3: Upgrade ch03

Repeat for ch03.

The beauty of this is that while ch01 is down for upgrade, ch02 and ch03 continue to serve queries and replicate data. When ch01 comes back up, it will catch up on any data written while it was offline.

A common pitfall is not adequately checking replication status after an upgrade. If an upgraded node cannot sync with ZooKeeper or other replicas, it might serve stale data or even fail to serve queries. Always monitor system.replicas and system.zookeeper on the upgraded node.

The next challenge you’ll likely face is managing configuration changes across the cluster during or after the upgrade, ensuring all nodes have consistent config.xml settings.

Want structured learning?

Take the full Clickhouse course →