The most surprising thing about Elasticsearch’s join field is that it’s not actually a relational database join at all.

Let’s see it in action. Imagine we’re building a system to manage software projects and their associated tasks. A project can have many tasks, but a task belongs to only one project.

First, we define our index mapping. Notice the join field type, specifying name: "relation" and then defining the possible parent-child relationships. Here, project is the parent and task is the child.

PUT my-index
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "project_id": { "type": "keyword" },
      "assignee": { "type": "keyword" },
      "description": { "type": "text" },
      "parent_task": {
        "type": "join",
        "relations": {
          "project": "task"
        }
      }
    }
  }
}

Now, let’s index some documents. We’ll add a parent project document first. For parent documents, the join field is omitted.

POST my-index/_doc/1
{
  "title": "Elasticsearch Optimization Project",
  "project_id": "proj-123"
}

Next, we add child task documents. These documents must include the parent_task field, specifying the name of the parent type (project) and its id (1).

POST my-index/_doc/2
{
  "title": "Implement new indexing strategy",
  "assignee": "alice",
  "description": "Refactor index mapping for better performance.",
  "parent_task": {
    "name": "project",
    "id": "1"
  }
}

POST my-index/_doc/3
{
  "title": "Develop query optimization",
  "assignee": "bob",
  "description": "Analyze slow queries and propose solutions.",
  "parent_task": {
    "name": "project",
    "id": "1"
  }
}

To retrieve all tasks for a specific project, we use the has_child query. This query filters parent documents based on the existence of child documents matching a specific query.

GET my-index/_search
{
  "query": {
    "has_child": {
      "type": "task",
      "query": {
        "match_all": {}
      },
      "min_children": 1,
      "max_children": 10
    }
  }
}

This query will return the parent project document (document ID 1) because it has child task documents (documents 2 and 3). If you wanted to retrieve the child documents themselves, you’d typically use a join field in combination with a parent_id query or a nested query structure. However, has_child is for finding parents based on their children.

The join field itself doesn’t store relationships at query time; it indexes parent-child relationships when documents are ingested. When you query using has_child, Elasticsearch traverses the pre-indexed relationship information. This makes retrieving parents based on their children efficient, but retrieving children based on their parents is typically done by including the parent’s ID in the child document and querying for that ID, or by using nested fields if the relationship is strictly one-to-many and the children are always accessed within the parent’s context. The join field is designed for scenarios where you frequently need to find parents based on their children’s attributes, or vice-versa, without denormalizing all data.

If you want to retrieve the child documents and also see which parent they belong to, you’d query the child type and use a parent_id query.

GET my-index/_search
{
  "query": {
    "parent_id": {
      "type": "task",
      "id": "1"
    }
  }
}

This query will return documents 2 and 3, the tasks associated with project 1. The parent_id query leverages the internal _parent field that Elasticsearch creates when you define a join field. It’s essentially a reverse lookup of the parent_task information.

The real power of the join field comes into play when you need to perform aggregations or searches that span both parent and child documents, or when you need to filter parents based on complex criteria applied to their children. For example, you could find projects that have at least one task assigned to "alice" and has a description containing "performance".

GET my-index/_search
{
  "query": {
    "has_child": {
      "type": "task",
      "query": {
        "bool": {
          "must": [
            { "match": { "assignee": "alice" } },
            { "match": { "description": "performance" } }
          ]
        }
      }
    }
  }
}

This query demonstrates how you can push filtering logic down to the child documents to find specific parent documents. The has_child query is a parent-centric search; it returns the parent documents that satisfy the condition.

What most people don’t realize is that the join field stores a hidden _parent field in child documents. This field contains the internal Lucene document ID of the parent document. When you use parent_id queries, Elasticsearch is directly using this _parent field for efficient lookups. This is distinct from simply storing a parent_id string, as it’s an internal, optimized reference.

The next hurdle you’ll encounter is understanding the performance implications of join fields, particularly with large numbers of parent-child documents and complex queries.

Want structured learning?

Take the full Elasticsearch course →