Elasticsearch nodes are crashing with OutOfMemoryError: Java heap space or experiencing severe performance degradation due to excessive garbage collection (GC).

Here’s why this happens and how to fix it:

Common Causes and Fixes

1. Heap Size Too Small for Workload

The most common reason is simply not allocating enough memory for Elasticsearch to manage its indices, caches, and ongoing operations. Elasticsearch is memory-hungry.

  • Diagnosis: Check the current JVM heap settings in your jvm.options file (usually located at /etc/elasticsearch/jvm.options or within the config directory of your Elasticsearch installation). Look for lines starting with -Xms and -Xmx.
    grep -E '^-Xms|-Xmx' /etc/elasticsearch/jvm.options
    
  • Fix: Increase both -Xms (initial heap size) and -Xmx (maximum heap size) to the same value. A common recommendation is 50% of system RAM, but no more than 30-32GB (due to compressed ordinary object pointers, or "compressed oops"). For example, if you have 16GB of RAM, set both to 8g:
    -Xms8g
    -Xmx8g
    
    Restart Elasticsearch for the change to take effect.
  • Why it works: A larger heap gives Elasticsearch more room to store data structures, caches, and thread stacks, reducing the frequency and duration of GC pauses. Setting -Xms and -Xmx to the same value prevents the JVM from resizing the heap, which can cause performance hiccups.

2. Heap Size Too Large, Causing Swapping

Conversely, allocating too much heap can be just as bad. If you allocate more than 50% of system RAM, or more than the OS can comfortably manage alongside its own needs, the system might start swapping memory to disk. Swapping is catastrophic for Elasticsearch performance.

  • Diagnosis: Monitor system memory usage with free -m or top. Look for significant swap usage. On systems with systemd, systemctl status elasticsearch might show related errors or warnings if the service is struggling due to resource contention.
  • Fix: Reduce the heap size in jvm.options to be no more than 50% of total system RAM, and ensure it doesn’t exceed the 30-32GB compressed oops limit. For instance, on a 32GB machine, a 12g heap is often a good starting point.
    -Xms12g
    -Xmx12g
    
    Restart Elasticsearch.
  • Why it works: By staying within reasonable memory limits, you prevent the OS from resorting to slow disk swapping, ensuring Elasticsearch’s data and operations reside in fast RAM.

3. Inadequate System Memory (Overall)

Even with a correctly sized JVM heap, if the overall system doesn’t have enough RAM for the OS, file system cache, and Elasticsearch’s JVM heap combined, you’ll still face issues. Elasticsearch relies heavily on the OS’s file system cache for performance.

  • Diagnosis: Use free -m to check total RAM and available memory. A system with 32GB RAM where Elasticsearch’s heap is set to 16GB leaves only 16GB for the OS, file system cache, and other processes.
  • Fix: Add more RAM to the server or reduce the overall memory footprint of other processes running on the same node. If possible, dedicate nodes solely to Elasticsearch.
  • Why it works: Sufficient system RAM allows the OS to effectively cache index files, and provides ample space for the JVM heap without contention.

4. Incorrect bootstrap.memory_lock Setting

If bootstrap.memory_lock is not enabled, the JVM might not be able to lock its heap memory, potentially leading to swapping even if you’ve allocated a reasonable heap size.

  • Diagnosis: Check your elasticsearch.yml file for the bootstrap.memory_lock setting.
    grep bootstrap.memory_lock /etc/elasticsearch/elasticsearch.yml
    
  • Fix: Ensure bootstrap.memory_lock: true is set in elasticsearch.yml. You also need to configure the OS to allow this. For Linux, this typically involves adding memlock to the limits.conf file for the elasticsearch user (e.g., elasticsearch soft memlock unlimited and elasticsearch hard memlock unlimited) and ensuring mlockall is set to true in jvm.options.
    # elasticsearch.yml
    bootstrap.memory_lock: true
    
    # /etc/security/limits.d/elasticsearch.conf (or limits.conf)
    * soft memlock unlimited
    * hard memlock unlimited
    
    # jvm.options
    -XX:+UseConcMarkSweepGC
    -XX:CMSInitiatingOccupancyFraction=75
    -XX:+UseCMSInitiatingOccupancyOnly
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/var/lib/elasticsearch/heapdumps
    -XX:+ExitOnOutOfMemoryError
    -Des.max-open-files=65536
    -Des.security.manager.enabled=true
    -Xms4g
    -Xmx4g
    -XX:+AlwaysPreTouch # <--- This is important for mlockall
    
    Restart Elasticsearch and potentially the OS or the user session for limit changes to apply.
  • Why it works: bootstrap.memory_lock: true (and mlockall) prevents the JVM heap from being swapped to disk by the operating system, guaranteeing that the heap stays in physical RAM. The +AlwaysPreTouch JVM option helps ensure all heap memory is allocated and touched at startup, making mlockall more effective.

5. Too Many Shards Per Node

Each shard consumes memory for its data structures, file handles, and thread pools. A high number of shards, especially on nodes with limited RAM, can lead to excessive memory pressure.

  • Diagnosis: Use the Cluster Health API or Nodes Stats API to count the number of shards per node.
    curl -X GET "localhost:9200/_cat/shards?v" | wc -l
    # or
    curl -X GET "localhost:9200/_nodes/stats/indices/segments?pretty"
    
  • Fix: Reduce the number of shards per node. This can be achieved by:
    • Consolidating indices (e.g., using Index Lifecycle Management (ILM) to merge smaller indices).
    • Increasing the number of nodes in your cluster.
    • Re-evaluating your indexing strategy to use fewer shards per index.
  • Why it works: Fewer shards per node means less memory overhead per node, distributing the load more evenly and reducing the chance of any single node running out of memory.

6. Inefficient Indexing or Querying

Certain indexing patterns or complex queries can temporarily consume large amounts of heap memory. For example, large bulk requests, complex aggregations, or queries that require loading significant portions of an index into memory.

  • Diagnosis: Use the Nodes Hot Threads API to identify which threads are consuming CPU and potentially heap.
    curl -X GET "localhost:9200/_nodes/hot_threads?pretty"
    
    Analyze slow logs for problematic queries.
  • Fix:
    • Break down large bulk requests into smaller batches.
    • Optimize complex aggregations or queries.
    • Use scroll API for deep pagination instead of from/size.
    • Ensure your mapping is efficient and doesn’t use dynamic mapping excessively.
  • Why it works: By making indexing and querying operations more efficient, you reduce the peak memory demands placed on the JVM heap.

After ensuring your JVM heap is correctly set and system resources are adequate, the next common issue you’ll encounter is CircuitBreakingException: [parent] Data too large..., indicating that Elasticsearch is running out of heap memory for requests or index buffers.

Want structured learning?

Take the full Elasticsearch course →