Autoscaling Elasticsearch nodes is less about adding more nodes and more about surgically extracting or inserting them based on actual load.

Let’s look at a typical scenario: an Elasticsearch cluster with a few nodes, serving a web application’s search queries.

{
  "cluster_name": "my-es-cluster",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 10,
  "active_shards": 30,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "long_node_ids": [
    "node-1-abc",
    "node-2-def",
    "node-3-ghi"
  ],
  "nodes": {
    "node-1-abc": {
      "name": "node-1-abc",
      "transport_address": "10.0.1.10:9300",
      "host": "10.0.1.10",
      "ip": "10.0.1.10",
      "version": "7.17.9",
      "build_flavor": "default",
      "build_type": "deb",
      "build_hash": "a0e26122",
      "total_indexing_buffer": 2147483647,
      "total_busy_thread_count": 12,
      "total_thread_pool_search_threads": 8,
      "total_thread_pool_index_threads": 8,
      "total_thread_pool_merge_threads": 8,
      "total_thread_pool_refresh_threads": 8,
      "jvm_version": "11.0.17",
      "mem": {
        "heap_used_in_bytes": 2147483648,
        "heap_max_in_bytes": 2147483648,
        "heap_percent": 100,
        "non_heap_used_in_bytes": 145889152,
        "non_heap_max_in_bytes": 145889152,
        "mapped_memory_in_bytes": 1073741824
      },
      "fs": {
        "total_in_bytes": 107374182400,
        "free_in_bytes": 53687091200,
        "available_in_bytes": 53687091200
      },
      "os": {
        "name": "Linux",
        "version": "5.15.0-76-generic",
        "arch": "amd64",
        "available_processors": 4
      },
      "process": {
        "cpu": 50,
        "memory": {
          "total_virtual_in_bytes": 10737418240
        }
      },
      "roles": [
        "master",
        "data",
        "ingest",
        "ml"
      ],
      "attributes": {
        "ml.machine_memory": 8589934592,
        "ml.max_open_jobs": 50
      }
    },
    // ... other nodes ...
  }
}

This output shows a cluster named my-es-cluster with 3 data nodes. Node node-1-abc is running at 100% heap usage, which is a clear sign of distress. The system is configured with a fixed number of nodes, meaning if load spikes, performance will degrade, and if load drops, resources are being wasted.

The Problem: Static Node Count

Elasticsearch’s performance is directly tied to the resources available to its nodes. When you have a static number of nodes, you face two problems:

  1. Under-provisioning: During peak traffic, nodes get overloaded. This leads to slow query responses, indexing backlogs, and potentially cluster instability. The JVM heap often becomes a bottleneck, as seen with heap_percent: 100.
  2. Over-provisioning: During off-peak hours, nodes sit idle, consuming CPU, memory, and disk space unnecessarily. This increases operational costs.

The Solution: Deployment Policies

Deployment policies allow you to define rules for automatically scaling your Elasticsearch cluster up or down. These policies are typically managed by a cloud provider’s orchestration layer (like AWS EC2 Auto Scaling Groups, Kubernetes Deployments with Horizontal Pod Autoscalers, or a managed Elasticsearch service). The core idea is to monitor key metrics and adjust the node count accordingly.

Key Metrics for Scaling

The most common metrics to trigger scaling actions are:

  • CPU Utilization: High CPU on data nodes indicates that processing power is a bottleneck.
  • JVM Heap Usage: When nodes are running out of heap memory, performance suffers drastically. This is often the most sensitive indicator.
  • Indexing/Search Latency: Increased latency suggests the cluster is struggling to keep up with requests.
  • Disk I/O: If nodes are constantly waiting for disk operations, it can slow down the entire cluster.

How Scaling Works (Conceptual)

  1. Monitoring: A monitoring agent (or the orchestration platform) continuously collects metrics from your Elasticsearch nodes.
  2. Policy Evaluation: These metrics are compared against predefined thresholds in your deployment policy.
  3. Scaling Action:
    • Scale Up: If metrics exceed upper thresholds (e.g., average CPU > 80%, average heap > 90%) for a sustained period, the system adds new nodes.
    • Scale Down: If metrics fall below lower thresholds (e.g., average CPU < 30%, average heap < 50%) for a sustained period, the system removes nodes.

Practical Implementation Example (Kubernetes)

Let’s imagine you’re running Elasticsearch on Kubernetes using the Elastic Cloud on Kubernetes (ECK) operator. You’d define a Deployment for your Elasticsearch cluster and then a PodDisruptionBudget and potentially an HorizontalPodAutoscaler (though ECK often manages scaling more directly via its own CRDs).

A simplified ECK Elasticsearch custom resource might look like this:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: my-es-cluster
spec:
  version: 7.17.9
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false # Example config
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: "8Gi"
              cpu: "2"
            limits:
              memory: "8Gi"
              cpu: "4"
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # This is the volume claim for data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard # Or your specific storage class

In this ECK configuration, count: 3 sets the initial number of nodes. For autoscaling, ECK doesn’t directly use a Kubernetes HorizontalPodAutoscaler on the Elasticsearch resource itself. Instead, ECK’s own reconciliation loop monitors the cluster’s health and resource usage. When the cluster is unhealthy or resource-constrained, ECK can be configured to adjust the count or trigger scaling of the underlying Kubernetes nodes if they are managed by an autoscaler.

For true autoscaling of Elasticsearch pods based on metrics, you’d typically integrate with a Kubernetes HorizontalPodAutoscaler targeting the Elasticsearch stateful set managed by ECK, or use a custom operator that monitors Elasticsearch metrics and adjusts the count field in the Elasticsearch CR.

Example of a conceptual HPA (this is not directly applied to the Elasticsearch CR but to the underlying Pods/StatefulSet if ECK exposes that):

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-es-cluster-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet # Assuming ECK manages an underlying StatefulSet
    name: my-es-cluster-es-http # This name will vary based on ECK's naming conventions
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85

This HPA would attempt to scale the pods of the Elasticsearch service up to 10 replicas if their CPU utilization averaged above 80% or memory above 85%. ECK would then need to be aware of these changes and potentially adjust its internal node counts or configurations.

The "Why It Works" Nuance

Elasticsearch is designed for distributed operation. Adding a node isn’t just about more raw power; it’s about distributing the shard load. When you add a node, Elasticsearch’s shard allocation mechanisms will automatically start moving shards to the new node, balancing the load across the cluster. This offloads work from the overloaded nodes, allowing them to recover and improving overall cluster responsiveness. Conversely, when removing a node, shards are rebalanced before the node is terminated, ensuring data availability and preventing a sudden spike in load on the remaining nodes.

The Next Step: Shard Rebalancing Strategies

Once you have autoscaling in place, the next challenge is managing how shards are distributed. If your autoscaling adds and removes nodes frequently, you might find shards constantly moving, impacting performance. Understanding and configuring shard allocation awareness, rerouting, and balancing settings becomes crucial for a stable, autoscaled cluster.

Want structured learning?

Take the full Elasticsearch course →