Choosing the right number of Elasticsearch replicas is a delicate balancing act between ensuring your data is highly available and preventing your cluster from grinding to a halt under load.

Let’s see this in action. Imagine a simple Elasticsearch cluster with a single node, hosting a single index named my_index.

GET _cat/indices/my_index?v

Output:

health status index    uuid                   pri rep docs.count store.size
yellow open     my_index d0fXJ...            1   0    1000000   1.5gb

Notice the rep column: it’s 0. This means there are no replicas. If this node dies, all 1,000,000 documents are gone. Now, let’s add one replica:

PUT my_index/_settings
{
  "index": {
    "number_of_replicas": 1
  }
}

Wait a few seconds, and check again:

GET _cat/indices/my_index?v

Output:

health status index    uuid                   pri rep docs.count store.size
green  open     my_index d0fXJ...            1   1    1000000   1.5gb

The rep is now 1. The index is green because Elasticsearch has successfully created a copy of all primary shards on a different node (or, in a single-node cluster, it will try to, and you’ll see it as yellow until a second node is available). If the primary node fails, Elasticsearch can promote one of the replicas to become the new primary, and your data remains accessible. This is the core of availability.

But replicas aren’t free. Each replica is a full copy of the primary shard, meaning it consumes disk space and requires resources to keep itself synchronized. When you index data, Elasticsearch writes it to the primary shard, and then that change needs to be replicated to all replica shards. This replication process adds overhead to your indexing operations. For read operations, Elasticsearch can distribute search requests across primary and replica shards, which can improve query performance by parallelizing the work. However, if you have too many replicas, the cost of keeping them all synchronized during indexing can outweigh the benefits of read distribution.

The most surprising thing about replica counts is that number_of_replicas: 1 is often the sweet spot for many applications, even those that seem to demand extreme availability. The reason is that Elasticsearch’s failover mechanism is incredibly fast. When a node goes down, the master node quickly detects the failure and initiates the promotion of a replica shard to a primary. The downtime during this promotion is typically measured in seconds, not minutes or hours, and for most services, this is perfectly acceptable. Many engineers over-provision replicas out of a misunderstanding of this rapid failover capability, leading to unnecessary resource consumption and indexing latency.

The key levers you control are number_of_shards (which determines how your data is partitioned) and number_of_replicas (which determines how many copies of each partition exist). For a given index, the total number of shard copies will be number_of_shards * (1 + number_of_replicas). So, if you have 5 primary shards and 2 replicas, you have a total of 5 * (1 + 2) = 15 shard copies distributed across your cluster.

When deciding on your replica count, consider your acceptable downtime. If you can tolerate a few seconds of unavailability during a node failure, 1 replica is usually sufficient. If you have extremely critical systems where even seconds of downtime are unacceptable, you might consider 2 replicas, but be acutely aware of the performance implications on indexing. For read-heavy workloads, more replicas can help distribute search traffic. For write-heavy workloads, fewer replicas are generally better. You can also dynamically adjust the replica count for an index without downtime using the _settings API, allowing you to tune it as your workload evolves.

The next challenge you’ll face is understanding how shard allocation awareness can further improve availability in multi-data center or availability zone deployments.

Want structured learning?

Take the full Elasticsearch course →