Elasticsearch doesn’t actually store nested objects as separate documents; it flattens them into a single document by default.

Let’s say you have a document representing a blog post with a list of comments, and each comment has an author and text:

PUT /my-blog/post/1
{
  "title": "My First Post",
  "author": "Alice",
  "comments": [
    {
      "author": "Bob",
      "text": "Great post!"
    },
    {
      "author": "Charlie",
      "text": "Thanks for sharing."
    }
  ]
}

If you try to search for a post where a comment’s author is "Bob", a simple match query on comments.author won’t work as you’d expect. Elasticsearch would flatten this into something like:

title: My First Post
author: Alice
comments.author: Bob
comments.text: Great post!
comments.author: Charlie
comments.text: Thanks for sharing.

Notice how comments.author appears twice. A match query for comments.author: Bob would find documents where any comments.author field contains "Bob", but it wouldn’t guarantee that the same comment had both the author "Bob" and some specific text. It treats each field value independently.

This is where the nested data type comes in. When you map a field as nested, Elasticsearch indexes each object in the array as a separate, hidden "document" within the main document. This preserves the relationship between the fields within each individual comment.

Here’s how you’d map the comments field as nested:

PUT /my-blog
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "author": { "type": "keyword" },
      "comments": {
        "type": "nested",
        "properties": {
          "author": { "type": "keyword" },
          "text": { "type": "text" }
        }
      }
    }
  }
}

Now, when you index the same document:

PUT /my-blog/post/1
{
  "title": "My First Post",
  "author": "Alice",
  "comments": [
    {
      "author": "Bob",
      "text": "Great post!"
    },
    {
      "author": "Charlie",
      "text": "Thanks for sharing."
    }
  ]
}

Elasticsearch internally treats this as three separate documents (conceptually):

  1. _source: { "title": "My First Post", "author": "Alice" } comments.author: "Bob" comments.text: "Great post!"
  2. _source: { "title": "My First Post", "author": "Alice" } comments.author: "Charlie" comments.text: "Thanks for sharing."

To query this, you use the nested query. To find posts where a comment author is "Bob" AND the comment text is "Great post!", you’d do:

GET /my-blog/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "comments.author": "Bob"
              }
            },
            {
              "match": {
                "comments.text": "Great post!"
              }
            }
          ]
        }
      }
    }
  }
}

This query specifically looks for documents where, within the comments array, there exists at least one element that satisfies both conditions: comments.author is "Bob" AND comments.text is "Great post!".

The path parameter in the nested query tells Elasticsearch which nested field to operate on. The query inside the nested query is then applied to each of the hidden documents created for that nested field.

When you want to score results based on how well the nested parts match, you can use inner_hits. This allows you to retrieve the specific nested objects that matched the query.

GET /my-blog/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "match": {
          "comments.author": "Bob"
        }
      },
      "inner_hits": {
        "name": "matching_comments",
        "highlight": {
          "fields": {
            "comments.text": {}
          }
        }
      }
    }
  }
}

This will return the main document and, under hits.hits[0].inner_hits.matching_comments, it will show you the specific comment(s) authored by "Bob".

One subtle but critical aspect is how Elasticsearch handles updates to nested fields. If you update a document with a nested field, Elasticsearch doesn’t just modify the existing nested objects. It effectively re-indexes the entire nested field from scratch. This means that if you’re performing frequent, small updates to individual items within a large nested array, it can be less efficient than you might expect, as the entire nested structure is processed.

Aggregating on nested fields requires specifying the path in the aggregation as well, ensuring that aggregations are performed on the correctly scoped nested objects.

Want structured learning?

Take the full Elasticsearch course →