The Elasticsearch percolator is a reverse search engine, letting you store queries and then efficiently match incoming documents against them.

Imagine you’re running an alert system for new product releases. You have a list of keywords and criteria that users care about. Instead of constantly querying Elasticsearch with each new product description, you "percolate" the product description against your stored user queries. If a product matches a user’s query, Elasticsearch tells you which user query was matched.

Here’s a simplified view of the process:

  1. Index the Queries: You don’t index documents into the percolator. Instead, you index your queries into a special index, often named percolator or queries. Each document in this index represents a stored query.

    PUT /my_percolator_index/_doc/1
    {
      "query": {
        "match": {
          "title": "wireless mouse"
        }
      }
    }
    
    PUT /my_percolator_index/_doc/2
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "category": "electronics" } },
            { "range": { "price": { "lt": 50 } } }
          ]
        }
      }
    }
    

    In this setup, my_percolator_index is where your queries live. Document 1 is a simple query for "wireless mouse." Document 2 is a more complex query requiring the category to be electronics AND the price to be less than 50.

  2. Percolate a Document: When a new document arrives (e.g., a new product description), you send it to the percolator API of your other index (the one where your actual documents are stored).

    POST /my_documents_index/_percolate
    {
      "doc": {
        "title": "Logitech Wireless Mouse MX Master 3",
        "category": "electronics",
        "price": 99.99
      }
    }
    

    Elasticsearch then takes the provided doc and runs it against all the queries stored in my_percolator_index.

  3. Get the Matches: The response will tell you which of your stored queries were matched by the incoming document.

    {
      "total": 1,
      "matches": [
        {
          "_index": "my_percolator_index",
          "_id": "1"
        }
      ]
    }
    

    In this case, the "Logitech Wireless Mouse MX Master 3" document matched query 1 (the "wireless mouse" query) but not query 2 (because the price was 99.99, not less than 50).

This setup is incredibly powerful for scenarios like:

  • Alerting: Users define what they want to be notified about. New content matching their criteria triggers an alert.
  • Content Recommendation: Suggesting articles or products to users based on their past behavior, by percolating their profile against available content.
  • Spam Filtering: Users define what constitutes spam; incoming emails are percolated against these definitions.
  • Moderation: Defining rules for inappropriate content and filtering incoming user-generated text.

The core idea is that you’re turning the search logic on its head. Instead of searching for documents that match a query, you’re asking "which stored queries does this document match?"

The real magic happens when you consider the performance implications. If you have millions of users with complex alert criteria, querying them all against every new piece of content would be prohibitively expensive. The percolator leverages Elasticsearch’s internal indexing structures (like inverted indexes and query caches) to perform these matches extremely efficiently. When you index a query into the percolator, Elasticsearch essentially builds a representation of that query that can be quickly compared against incoming documents. It’s not re-executing the query every time; it’s checking if the document’s terms and structure align with the pre-indexed query.

The _percolate API can also take a query to filter which stored queries are considered. This is useful if you have a massive number of stored queries but only want to match against a subset.

POST /my_documents_index/_percolate
{
  "query": {
    "term": {
      "user_id": "user_abc"
    }
  },
  "doc": {
    "title": "New Gaming Laptop",
    "category": "electronics",
    "price": 1200.00
  }
}

In this example, only queries in my_percolator_index that are associated with user_abc (assuming you’ve indexed a user_id field within your percolator documents) will be evaluated against the incoming "New Gaming Laptop" document.

A common pitfall is forgetting that the percolator index needs its own mappings, especially if you’re using custom analyzers or specific field types for your queries that differ from your main document index. You might also want to store metadata alongside your queries in the percolator index, such as the user ID who owns the query or an identifier for the alert rule.

PUT /my_percolator_index/_doc/3
{
  "user_id": "user_abc",
  "alert_name": "High-end gaming deals",
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "gaming laptop" } },
        { "range": { "price": { "gt": 1000 } } }
      ]
    }
  }
}

When percolating, you can then target specific users or alert types by adding a query to the _percolate request.

The real power of the percolator lies in its ability to handle a large number of queries against a high volume of incoming documents without requiring expensive client-side logic or repeated searches. It’s a core Elasticsearch feature that enables sophisticated real-time filtering and notification systems.

What many users don’t realize is that the percolator index itself can be searched like any other index. This allows you to retrieve, update, or delete stored queries using standard Elasticsearch APIs, making management of your percolator rules straightforward.

After successfully setting up and using the percolator, the next logical step is often to optimize the performance of the percolator itself by tuning its index settings or exploring more advanced query structures.

Want structured learning?

Take the full Elasticsearch course →