Enable and Analyze Elasticsearch Slow Query Logs (2026)

Elasticsearch doesn’t actually have "slow query logs" in the traditional sense; instead, it logs "search phases" that take too long, giving you granular insight into where your search latency is hiding.

Let’s see this in action. Imagine you’ve got a cluster and you’re seeing some sluggishness. You want to know why. You’re not just looking for a slow query, but which part of the search is slow.

Here’s a typical request hitting your Elasticsearch cluster:

GET /my_index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "elasticsearch" } },
        { "term": { "status": "published" } }
      ],
      "filter": [
        { "range": { "publish_date": { "gte": "2023-01-01" } } }
      ]
    }
  },
  "aggs": {
    "articles_by_category": {
      "terms": { "field": "category.keyword" }
    }
  }
}

This query does a few things:

Querying: It finds documents matching "elasticsearch" in the title and having status as "published."
Filtering: It further restricts results to those published on or after January 1, 2023.
Aggregating: It then groups the results by category.keyword and counts them.

Elasticsearch breaks down the execution of such a request into several phases. The most common ones you’ll see logged when a query is deemed "slow" are:

query_cache: This phase checks if the results for this specific query (or a very similar one) are already in the query cache. If so, it serves them directly, which is lightning fast.
query: This is the core search phase where Elasticsearch traverses the inverted index to find matching documents for the query part. This involves fetching terms, checking document frequencies, and scoring.
rewrite: Before executing the query, Elasticsearch often rewrites complex queries (like wildcard or regexp) into simpler term queries. This phase handles that transformation.
collapse: If you’re using the collapse feature to group documents by a certain field, this phase handles the collapsing logic.
search_throttled: If the cluster is under heavy load, Elasticsearch might throttle search requests to prevent overload.
fetch: Once documents are identified, this phase retrieves the actual _source content for those documents, which can be expensive if you’re fetching many large documents.
aggregations: This phase executes any aggregations defined in the query.

Enabling Slow Search Logging

To capture these slow phases, you need to configure Elasticsearch. You do this via the elasticsearch.yml configuration file or by updating the cluster settings dynamically. The key setting is index.search.slowlog.threshold.query.

Here’s how you’d set it dynamically to log queries taking longer than 2 seconds (2000ms) for the query phase and 5 seconds (5000ms) for the fetch phase:

PUT _cluster/settings
{
  "persistent": {
    "index.search.slowlog.threshold.query": "2s",
    "index.search.slowlog.threshold.fetch": "5s",
    "index.search.slowlog.threshold.aggregation": "5s",
    "index.search.slowlog.threshold.suggest": "5s"
  }
}

index.search.slowlog.threshold.query: This is the most common one. It logs when the index traversal and scoring part of a search takes longer than the specified time.
index.search.slowlog.threshold.fetch: Logs when retrieving the actual document source (_source) for the matching hits takes too long.
index.search.slowlog.threshold.aggregation: Logs when the aggregation phase exceeds the threshold.
index.search.slowlog.threshold.suggest: Logs when suggest queries exceed the threshold.

You also need to ensure the slow logs are enabled for the specific index or all indices. You can set this at the cluster level or per index.

PUT _cluster/settings
{
  "persistent": {
    "index.search.slowlog.enabled": true
  }
}

Or for a specific index:

PUT my_index/_settings
{
  "index.search.slowlog.enabled": true
}

Analyzing the Logs

Once configured, slow search phases will be logged to Elasticsearch’s standard log files (usually elasticsearch.log). The log entries are quite detailed. You’ll see something like this:

[2023-10-27T10:30:00,123][WARN ][o.e.search.slowlog          ] [node-1] [my_index][0] took [3150ms] on phase [query], user [elastic], id [abcdef123456], request [GET /my_index/_search?pretty { "query": { ... }}]

This tells you:

The node where it occurred (node-1).
The index and shard (my_index, 0).
The time taken (3150ms).
The phase that was slow (query).
The user making the request (elastic).
The unique request ID (abcdef123456).
The actual request body.

Common Causes and Fixes

Inefficient Query Structure:
- Diagnosis: Analyze the request part of the slow log. Look for wildcard queries on fields that are not analyzed properly, or overly broad regexp queries.
- Fix: Rewrite wildcard queries to use term or match on keyword fields where possible. For regexp, try to make them more specific or consider alternative indexing strategies. For example, change {"wildcard": {"user.name": "joh*"}} to {"term": {"user.name.keyword": "john"}} if appropriate.
- Why it works: wildcard and regexp queries often require iterating through a large portion of the index terms, leading to high CPU and I/O. term and match on keyword fields use the efficient inverted index directly.
Large Number of Shards:
- Diagnosis: The slow log might show the slowness occurring across many shards, or you might notice high CPU/I/O on multiple nodes for a single query.
- Fix: Reduce the number of shards per index. For example, if you have an index with 100 shards and a query is slow, consider reindexing into a new index with fewer shards, say 10. POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index", "settings": { "index.number_of_shards": 10 }}}.
- Why it works: Each shard requires overhead for query execution. Spreading a query across too many shards amplifies this overhead and network latency.
Too Many Documents per Shard:
- Diagnosis: Slow logs consistently point to query or fetch phases on specific shards. Monitoring tools show high disk I/O or CPU on nodes hosting these shards.
- Fix: Increase the number of shards for new indices or reindex into an index with more shards. For instance, reindex to {"index.number_of_shards": 20}.
- Why it works: If a shard contains too many documents, the inverted index for that shard becomes very large, making traversal and document retrieval slower. More shards distribute the data and the workload.
Fetching Large _source Fields:
- Diagnosis: The slow log shows high latency specifically in the fetch phase. The query might not have a size limit, or it might be fetching many documents.
- Fix: Use _source filtering to retrieve only necessary fields: "_source": ["field1", "field2"]. If you don’t need the _source at all, disable it or use stored_fields. For instance, "_source": false.
- Why it works: Retrieving and serializing the entire _source for many documents is I/O and network intensive. Fetching only specific fields reduces this burden.
Complex Aggregations:
- Diagnosis: Slow logs consistently point to the aggregation phase. Aggregations might involve many terms, deep nesting, or complex pipeline aggregations.
- Fix: Optimize aggregations. For terms aggregations, consider using execution_hint: map for smaller cardinality fields or execution_hint: global_ordinals (default for keyword fields) for higher cardinality. Reduce the size parameter for terms aggregations if you don’t need all buckets. Use composite aggregations for pagination instead of deep scrolling.
- Why it works: terms aggregations on high-cardinality fields require significant memory and CPU to sort and count. map execution can be faster if the data distribution allows for it.
Insufficient Hardware Resources:
- Diagnosis: Slow logs appear across many queries and phases, accompanied by high CPU, memory, or I/O utilization metrics on Elasticsearch nodes.
- Fix: Scale up or out your Elasticsearch cluster. This might mean adding more nodes, increasing RAM, or upgrading CPU.
- Why it works: The cluster simply doesn’t have enough processing power or memory bandwidth to handle the workload within acceptable timeframes.
Mapping Issues (e.g., text fields for filtering/aggregations):
- Diagnosis: Queries involving match on text fields are slow, especially when combined with filters or aggregations. Slow logs might highlight the query phase.
- Fix: Ensure that fields used for exact matching, filtering, or aggregations are mapped as keyword. For example, if you have a tags field of type text, and you want to filter by exact tag, you should also have a tags.keyword field of type keyword in your mapping and query that. Change your query to {"term": {"tags.keyword": "important"}}.
- Why it works: text fields are analyzed (tokenized, lowercased, etc.), creating an inverted index optimized for full-text search. keyword fields are not analyzed and store exact values, making them efficient for exact matching and aggregations.

After addressing these, the next error you’ll likely encounter is a circuit_breaker_exception if you’ve pushed too much data too quickly, or a too_many_buckets_exception if your aggregation sizes are still too large.