Run Long Elasticsearch Queries Asynchronously (2026)

Elasticsearch doesn’t actually run queries asynchronously; it blocks the thread pool until the query completes.

Let’s see what that looks like in practice. Imagine you have a standard Elasticsearch setup and you want to run a query that’s going to take a while – say, analyzing a year’s worth of logs for a complex aggregation. You might fire off a request to your Elasticsearch cluster:

curl -X POST "localhost:9200/my-logs-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1y/y",
        "lt": "now/y"
      }
    }
  },
  "aggs": {
    "complex_analysis": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1d",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "now-1y/y",
          "max": "now/y"
        }
      },
      "aggs": {
        "terms_by_level": {
          "terms": {
            "field": "log_level",
            "size": 10
          }
        }
      }
    }
  }
}
'

While this query churns through potentially billions of documents, the Elasticsearch node handling this request is tied up. The request lands in the search thread pool. If your cluster is busy with many such long-running queries, or even a moderate number of them, these threads get exhausted. New incoming requests, even simple GET requests for a single document, will queue up. If the queue for the search thread pool grows too large, Elasticsearch will start rejecting requests with 429 Too Many Requests errors, or worse, threads might time out and cause cascading failures. The system becomes unresponsive not because it can’t do the work, but because the work it is doing is monopolizing the resources needed for all work.

The core problem is that Elasticsearch’s request handling model is synchronous by default. When a query hits the search thread pool, that thread is dedicated to that query until it finishes. There’s no built-in mechanism for Elasticsearch itself to say, "Hey, this query is going to take too long, let me park it and come back later." It must complete, or it must fail.

To handle long-running queries without blocking your main request threads, you need to offload the execution. Elasticsearch offers a feature called Task Management API, specifically designed for this. You can submit a query and get a task ID back. Elasticsearch then runs this task in the background, and you can poll the Task Management API to check its status and retrieve the results when it’s done.

Here’s how you’d submit the same query using the Task Management API:

First, submit the query to be run asynchronously. Note the wait_for_completion=false parameter:

curl -X POST "localhost:9200/_async_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1y/y",
        "lt": "now/y"
      }
    }
  },
  "aggs": {
    "complex_analysis": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1d",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "now-1y/y",
          "max": "now/y"
        }
      },
      "aggs": {
        "terms_by_level": {
          "terms": {
            "field": "log_level",
            "size": 10
          }
        }
      }
    }
  }
}
'

This will return a response like:

{
  "task_id": "aGVsbG8gd29ybGQ6MTEwMDMxMjM0NQ==",
  "status": {
    "succeeded": false,
    "state": "RUNNING",
    "progress": {
      "total": 1000,
      "completed": 150,
      "percentage": 15.0
    },
    "start_time_in_millis": 1678886400000,
    "creation_time_in_millis": 1678886400000,
    "description": "indices:data/read/search[phase/query]"
  },
  "created": true,
  "is_running": true,
  "start_time_in_millis": 1678886400000,
  "expiration_time_in_millis": 1678886460000
}

You’ll get a task_id. You then poll this task_id to check the status:

curl -X GET "localhost:9200/_async_search/aGVsbG8gd29ybGQ6MTEwMDMxMjM0NQ==?pretty"

Eventually, the status.state will change from RUNNING to COMPLETED, and the response field will contain your query results.

The critical insight here is that the _async_search endpoint doesn’t execute the query directly on the thread that receives the request. Instead, it submits the query as a background task to Elasticsearch’s internal task management system. This system uses a separate, dedicated thread pool (or manages threads dynamically) for executing these background tasks. The initial request returns almost immediately, freeing up the main search thread pool to handle other, potentially shorter, requests. You are then responsible for polling and retrieving the results, which is a much lighter operation than executing the original, heavy query. This decouples the long-running computation from the immediate request-response cycle, preventing the exhaustion of critical thread pools and maintaining cluster responsiveness.

Even when using _async_search, it’s crucial to understand that Elasticsearch isn’t truly asynchronous in the way a typical message queue might be. It’s more accurately described as background processing. The threads executing the background tasks are still part of the Elasticsearch cluster, and if you submit far too many concurrently running background tasks, you can still overwhelm the cluster’s resources, just a different set of resources (e.g., the JVM heap, CPU, or disk I/O) than the main request threads. The expiration_time_in_millis in the _async_search response indicates how long the task and its results will be retained; if you don’t retrieve them before expiration, they will be purged.

The next hurdle you’ll likely encounter is managing the lifecycle of these asynchronous tasks, specifically how to automatically clean them up or set appropriate expiration times to avoid resource exhaustion if results are never retrieved.