Cross-cluster search (CCS) lets you query multiple Elasticsearch clusters as if they were one, but the real magic is how it collapses distant data into a single, unified view without needing to replicate it.
Let’s see it in action. Imagine you have two clusters: cluster_one and cluster_two.
First, you need to configure the remote cluster connection from the coordinating cluster (the one you’ll be sending queries to). In elasticsearch.yml on your coordinating cluster’s nodes, you’d add something like this:
cluster.remote:
cluster_two:
seeds: "host1:9300,host2:9300"
Here, cluster_two is the name you’ll use in your search requests to refer to the remote cluster. seeds are the transport addresses of the nodes in the remote cluster.
Now, from your coordinating cluster, you can search cluster_two by prefixing the index name with the remote cluster alias:
GET cluster_two:my_remote_index/_search
{
"query": {
"match": {
"message": "example search"
}
}
}
This query hits cluster_one, which then forwards the search request to cluster_two for my_remote_index. The results from cluster_two are then sent back to cluster_one and merged with any local results before being returned to the client.
To search across multiple remote clusters and local indices simultaneously, you can use a wildcard or a comma-separated list:
GET cluster_one:local_index,cluster_two:remote_index,cluster_three:another_remote_index/_search
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
Or, to search all indices on a remote cluster:
GET cluster_two:*/_search
{
"query": {
"range": {
"timestamp": {
"gte": "2023-01-01"
}
}
}
}
CCS is fundamentally about establishing trusted connections between Elasticsearch clusters. When a coordinating cluster initiates a search to a remote cluster, it uses the configured transport layer to communicate. The remote cluster receives the search request, executes it against its local indices, and streams the results back to the coordinating node. This all happens transparently to the client application, which only interacts with the coordinating cluster. The coordinating node acts as a proxy, aggregating results from all queried clusters before returning a single response.
The real power comes from how it orchestrates these distributed searches. When you query cluster_two:my_remote_index/_search, your coordinating cluster doesn’t just blindly forward the request. It first checks its own local indices for my_remote_index. If it finds matches locally, it initiates a search on those as well. Then, it sends the same search request (with the same query, size, sort order, etc.) to cluster_two. The results are streamed back and interleaved by the coordinating node, ensuring the final response adheres to the requested sort order and pagination. This means you can, for example, retrieve the top 10 most relevant documents across several clusters without explicitly knowing which cluster holds which document.
A common misconception is that CCS requires identical index mappings or settings across clusters. This is not true. Each remote cluster handles the search for its own indices independently based on its local configuration. The coordinating cluster only cares about receiving and merging the results. However, if you are using features like _source filtering or specific field comparisons that rely on consistent field types, you’ll want to ensure those fields are compatible across clusters to avoid unexpected behavior or errors when merging results. For example, if you’re sorting by a timestamp field, all clusters must have a timestamp field that can be reliably sorted.
The next logical step after mastering CCS is understanding how to manage index lifecycle policies across these federated clusters, especially when dealing with large datasets and retention requirements.