ReplicatedMergeTree tables don’t actually replicate data between nodes; they replicate metadata about data parts and coordinate replication using ZooKeeper.

Let’s see this in action. Imagine you have a ClickHouse cluster with two nodes, ch1 and ch2, and you want to set up a ReplicatedMergeTree table.

First, you need ZooKeeper running and accessible. ClickHouse uses ZooKeeper for leader election, metadata synchronization, and ensuring consistency across replicas.

On each ClickHouse node, you’ll configure the config.xml file to point to your ZooKeeper ensemble. For example, in /etc/clickhouse-server/config.xml on ch1:

<clickhouse>
    <zookeeper>
        <node>
            <host>zookeeper1.example.com</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper2.example.com</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper3.example.com</host>
            <port>2181</port>
        </node>
    </zookeeper>
</clickhouse>

You’d have a similar configuration on ch2, pointing to the same ZooKeeper ensemble.

Now, let’s create the ReplicatedMergeTree table. On one of the nodes (say, ch1), you’d execute:

CREATE TABLE my_replicated_table (
    event_date Date,
    id UInt64,
    value String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/my_replicated_table', '{replica}')
ORDER BY id;

The crucial parts here are:

  • /clickhouse/tables/{shard}/my_replicated_table: This is the ZooKeeper path where the table’s metadata will be stored. {shard} and {replica} are macros that ClickHouse resolves. For ch1, it might become /clickhouse/tables/01/my_replicated_table, and for ch2, it might become /clickhouse/tables/01/my_replicated_table as well if they are in the same shard.
  • {replica}: This macro resolves to a unique identifier for each replica. You’d typically set this in the users.xml or config.xml for each ClickHouse server. For ch1, you might set <macros><replica>ch1</replica></macros>, and for ch2, <macros><replica>ch2</replica></macros>.

After creating the table on ch1, you would then execute the exact same CREATE TABLE statement on ch2. ClickHouse detects that a table with the same ZooKeeper path already exists and registers ch2 as another replica for that table.

Now, if you insert data into my_replicated_table on ch1:

INSERT INTO my_replicated_table (event_date, id, value) VALUES ('2023-10-27', 123, 'hello');

ClickHouse on ch1 writes this data part locally. It then registers this new data part in ZooKeeper under the table’s path. ch2 watches ZooKeeper for changes. It sees the new data part metadata and knows it needs to fetch it. ClickHouse’s replication mechanism then handles the background download of that data part from ch1 to ch2.

The ORDER BY clause is critical. It defines the sort order of data within each part. This is not directly related to replication but is a core aspect of MergeTree performance.

The system in action means that data written to any replica is eventually available on all other replicas. If ch1 goes down, ch2 can still serve queries and accept writes. New writes will be coordinated through ZooKeeper, and when ch1 comes back online, it will fetch any data parts it missed.

The true power of ReplicatedMergeTree lies in its eventual consistency and fault tolerance. It doesn’t try to be synchronously consistent like a traditional RDBMS cluster; instead, it prioritizes availability and allows for eventual convergence of data.

One often overlooked aspect is how ClickHouse handles schema evolution with ReplicatedMergeTree. If you ALTER a table on one replica (e.g., ALTER TABLE my_replicated_table ADD COLUMN new_col String), ClickHouse writes this schema change command to ZooKeeper. All other replicas will then apply this ALTER command locally, ensuring schema consistency across the cluster. This process is also asynchronous and happens in the background.

The next concept to grasp is how to manage data sharding alongside replication for a truly distributed and highly available ClickHouse setup.

Want structured learning?

Take the full Clickhouse course →