Tune Elastic APM Server to Handle High Memory Pressure (2026)

Elastic APM Server is choking on memory and dropping requests under load.

The APM Server is failing to keep up with the volume of trace data being sent by your applications, leading to excessive memory consumption and eventual request rejections. This typically happens when the server’s internal queues fill up because the processing rate can’t match the ingestion rate.

Here are the common culprits and how to fix them:

1. Insufficient Heap Size: The Java Virtual Machine (JVM) heap is where APM Server stores its active data and caches. If it’s too small, the garbage collector will run constantly, or the server will OOM (Out Of Memory).

Diagnosis: Check the APM Server logs for OutOfMemoryError or excessive garbage collection pauses. You can also check the JVM heap usage via http://localhost:8200/metrics (look for jvm.mem.heap.used vs. jvm.mem.heap.max).
Fix: Increase the ES_JAVA_OPTS environment variable. For example, to set the heap to 4GB:
```
export ES_JAVA_OPTS="-Xms4g -Xmx4g"
```
Restart the APM Server for this to take effect. This gives the JVM more contiguous memory to work with, reducing GC pressure and allowing more data to be held in memory before being processed.
Why it works: A larger heap allows APM Server to buffer more incoming data and perform its internal processing tasks without immediately needing to reclaim memory.

2. Overloaded Ingestion Pipelines: APM Server uses ingest pipelines to process and enrich incoming data before sending it to Elasticsearch. If these pipelines are too complex or inefficient, they become a bottleneck.

Diagnosis: Examine your APM Server’s ingest pipelines in Kibana under Stack Management -> Ingest Pipelines. Look for pipelines with many processors, especially expensive ones like geoip or user_agent on high-volume data. Check APM Server logs for slow processing times or warnings about pipeline execution.
Fix: Simplify or disable unnecessary processors in your APM pipelines. For instance, if you don’t need geographic information for every single transaction, remove the geoip processor.
```
// Example of a simplified pipeline (remove geoip if not needed)
{
  "processors": [
    {
      "set": {
        "field": "agent.name",

        "value": "{{agent.name}}"

      }
    }
    // ... other essential processors
  ]
}
```
Apply the modified pipeline to your APM data streams. This reduces the CPU and memory overhead per document, allowing APM Server to process more documents.
Why it works: Each processor in a pipeline adds overhead. By removing or optimizing slow processors, you decrease the work APM Server needs to do for each incoming event.

3. High max_concurrent_outbound_connections: APM Server sends processed data to Elasticsearch. If this value is set too high, it can overwhelm Elasticsearch or APM Server’s own network buffers.

Diagnosis: Check your apm-server.yml configuration file for output.elasticsearch.max_concurrent_outgoings_connections. If it’s not set, it defaults to a reasonable value, but if it’s been manually increased, that could be the issue.
Fix: Reduce output.elasticsearch.max_concurrent_outgoings_connections in apm-server.yml. Start by lowering it to 4 or 8 and observing performance.
```
output.elasticsearch:
  hosts: ["http://localhost:9200"]
  max_concurrent_outgoings_connections: 8
```
Restart APM Server. This limits the number of simultaneous requests APM Server makes to Elasticsearch, preventing it from overwhelming the Elasticsearch cluster or its own connection pool.
Why it works: This parameter directly controls the concurrency of APM Server’s requests to Elasticsearch. Lowering it reduces the load APM Server places on Elasticsearch and its own network stack.

4. Inadequate queue_size for Inbound Requests: APM Server uses an in-memory queue to buffer incoming requests before they are processed. If this queue is too small, it will fill up quickly under high load.

Diagnosis: Monitor the apm.server.request.queue.size metric in APM Server’s metrics endpoint (http://localhost:8200/metrics). If this metric is consistently near its maximum capacity, the queue is too small.
Fix: Increase the queue_size in apm-server.yml. A common starting point for high-volume systems is 4096.
```
queue_size: 4096
```
Restart APM Server. This allows APM Server to buffer more incoming requests before it starts rejecting them, giving the processing threads more time to catch up.
Why it works: A larger queue provides a buffer, smoothing out bursts of incoming traffic and preventing APM Server from immediately dropping requests when the processing rate momentarily lags behind the ingestion rate.

5. Network Latency or Bandwidth Issues: High latency or insufficient bandwidth between APM Server and your agents/clients, or between APM Server and Elasticsearch, can cause requests to pile up.

Diagnosis: Use ping and traceroute from the APM Server to your client IPs and Elasticsearch IPs. Check network interface utilization (ifconfig or ip a) on the APM Server.
Fix: Address underlying network infrastructure problems. This might involve optimizing routing, increasing bandwidth, or moving APM Server and Elasticsearch closer in the network topology. Ensure there are no firewalls or network devices introducing excessive latency or packet loss.
Why it works: Reliable and fast network communication is crucial. Slowdowns here directly translate to longer processing times for requests and responses, leading to backlogs.

6. Too Many Agents Sending Data: While not a configuration issue, if you have an unexpectedly high number of agents (e.g., due to a misconfiguration causing agents to restart rapidly) sending data, the aggregate load might exceed the APM Server’s capacity.

Diagnosis: Check the number of active agents reporting to APM Server. You can often see this in Kibana’s APM UI or by querying APM Server metrics for agent counts.
Fix: Investigate why so many agents are active. If it’s a misconfiguration, correct it on the client side. If it’s legitimate, you may need to scale out your APM Server horizontally by running multiple instances behind a load balancer.
Why it works: Horizontal scaling distributes the load across multiple APM Server instances, each handling a subset of the incoming traffic.

After applying these fixes, you might encounter the next common issue: Request timeout: context deadline exceeded.