Cross-cluster replication (CCR) in Elasticsearch is a powerful feature that allows you to automatically replicate indices from a source cluster to one or more destination clusters. This is crucial for disaster recovery, high availability, and geo-distributed search scenarios.

Let’s see CCR in action. Imagine you have a primary cluster es-primary and you want to replicate an index named my-app-logs to a secondary cluster es-secondary.

First, you need to configure the remote cluster connection. On your es-secondary cluster, you’d add an entry to its elasticsearch.yml file:

# elasticsearch.yml on es-secondary
cluster.remote.es-primary:
  seeds: "es-primary-node1:9300,es-primary-node2:9300"

After restarting es-secondary for this change to take effect, you can verify the connection using curl:

curl -X GET "es-secondary-node1:9200/_remote/info"

You should see output indicating that es-primary is reachable.

Now, on the es-primary cluster, you need to create a leader index and a follower index. The leader index is the one you want to replicate. The follower index is the one on the destination cluster that will receive the replicated data.

On es-primary, create the leader index:

PUT /my-app-logs
{
  "settings": {
    "index.soft_deletes.enabled": true
  }
}

Notice the index.soft_deletes.enabled: true. This is a prerequisite for CCR. Soft deletes are essential for efficient replication as they allow Elasticsearch to track deleted documents without immediately removing them, enabling the follower to catch up on changes.

Next, on es-secondary, you create the follower index, specifying the leader index on es-primary and the replication configuration:

PUT /my-app-logs/_ccr/v1/follow
{
  "remote_cluster": "es-primary",
  "leader_index": "my-app-logs",
  "settings": {
    "index.number_of_shards": 3,
    "index.number_of_replicas": 0
  }
}

Here, remote_cluster points to the alias you defined in es-secondary’s elasticsearch.yml. leader_index is the name of the index on the source cluster. You can also specify settings for the follower index, such as index.number_of_shards and index.number_of_replicas. It’s common to set index.number_of_replicas to 0 on the follower if the primary goal is disaster recovery, as replicas on the follower would be redundant in that scenario.

Once this PUT request is sent, Elasticsearch on es-secondary initiates the replication process. It will first create the my-app-logs index with the specified settings and then start fetching operations from the leader index’s transaction log.

You can monitor the replication status:

GET /my-app-logs/_ccr/stats

This will show you details like the sync_lag_in_millis, which indicates how far behind the follower is from the leader. A sync_lag_in_millis of 0 means the follower is fully in sync.

The problem CCR solves is the manual, complex, and error-prone process of synchronizing data between Elasticsearch clusters. Traditionally, you might have scripts to export data from one cluster and import it into another, or rely on application-level logic to write to multiple clusters. CCR automates this, ensuring data consistency and availability with minimal operational overhead.

Internally, CCR works by having the follower cluster poll the leader cluster’s transaction log for recent changes. When changes are detected, the follower fetches them and applies them to its own copy of the index. This process is managed by the ccr plugin, which orchestrates the communication and data transfer. The leader index must have soft deletes enabled because CCR relies on the transaction log which uses soft deletes to track changes effectively. The follower index is essentially a "read-only" view that mirrors the leader’s state, though it can be written to if you decide to "unfollow" it.

The exact levers you control are primarily the replication settings when creating the follower index: remote_cluster, leader_index, and any index settings you wish to override on the follower. You can also control the replication process itself through APIs to pause, resume, or stop replication. For more advanced scenarios, you can configure retry mechanisms and timeouts for the replication process.

One thing most people don’t realize is that CCR, by default, does not replicate index settings like index.number_of_replicas or index.analysis. You must explicitly define these settings when creating the follower index if you want them to differ from the leader or if you want them to be present at all. This allows for flexibility, such as creating a follower with fewer replicas for cost savings or a different analysis chain if needed for specific downstream use cases, though typically for DR, you’d want identical settings for easy failover.

The next concept you’ll likely encounter is how to manage failover and failback scenarios with CCR.

Want structured learning?

Take the full Elasticsearch course →