Elasticsearch’s Index Lifecycle Management (ILM) is your automated way to handle time-series data, making sure old data gets cleaned up and new indices are created without you lifting a finger.

Let’s say you’re ingesting logs, metrics, or security events. They come in fast, and you don’t want to manually create new indices every day or week, or delete old ones. ILM is the answer.

Imagine you have a daily index pattern like logs-YYYY.MM.DD. You want to keep data for 30 days, then delete it. You also want to shrink the index size to save disk space after a week, and move it to cheaper storage after 14 days. ILM lets you define this as a policy and attach it to your index template.

Here’s a policy in action, defined in Kibana’s UI (or as JSON):

PUT _ilm/policy/my_log_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "14d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "my_s3_repository"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

In this policy:

  • hot phase: This is where new data lands. max_age of 1d means a new index will be created every day. max_primary_shard_size of 50gb means if a shard in the current index grows to 50GB, a new index will be rolled over even if it’s not a full day old. This is crucial for controlling shard size.
  • warm phase: After 7 days (min_age: "7d"), indices move here. We shrink the index to 1 primary shard to reduce overhead and forcemerge segments to optimize for read performance.
  • cold phase: At 14 days, indices are moved to cold. Here, we create a searchable_snapshot pointing to a repository (like S3) named my_s3_repository. This allows you to query the data without keeping it on hot nodes, saving significant costs.
  • delete phase: After 30 days, the index is automatically deleted.

To make this work, you need to:

  1. Create the policy: Use the PUT _ilm/policy/my_log_policy command as shown above.
  2. Configure a snapshot repository: If you’re using searchable_snapshot, you need to set up a repository first. For S3, this involves installing the repository plugin and configuring it.
  3. Apply the policy to an index template: This is the magic. When a new index matching the template is created, ILM automatically attaches.
PUT _index_template/my_log_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.lifecycle.name": "my_log_policy",
      "index.lifecycle.rollover_alias": "logs-write"
    }
  }
}

Here, index.lifecycle.name points to your ILM policy. index.lifecycle.rollover_alias is critical. When you index data, you always write to the alias logs-write. Elasticsearch, via ILM, ensures this alias always points to the current index that is eligible for writes. When a rollover condition (age or size) is met, Elasticsearch creates a new index, applies the template to it, and atomically updates the logs-write alias to point to the new index. Your applications continue writing to logs-write without noticing the switch.

The most surprising thing about rollover is how it uses a write alias to decouple your application from specific indices. Your application configures itself to write to logs-write, and the alias is managed by Elasticsearch. When a rollover happens, a new index is created (e.g., logs-2023.10.27), and the logs-write alias is updated to point to this new index. If your application also queries based on a date pattern like logs-*, it will see all indices, but the alias ensures writes always go to the "active" one.

The forcemerge action in the warm phase isn’t just about making segments smaller; it’s about reducing the number of file handles and kernel resources Elasticsearch needs to manage for that index. Each segment is essentially a set of files on disk. By merging many small segments into fewer, larger ones, you decrease the overhead associated with searching and managing those segments. This is why older, less frequently accessed data can become much faster to query after a forcemerge, even though it’s on slower storage.

Once you have ILM set up, the next thing you’ll want to explore is how to manage your snapshot repositories for searchable snapshots and understand the different storage tiers available in Elasticsearch.

Want structured learning?

Take the full Elasticsearch course →