Elastic APM data, by default, lands in indices like apm-<date>. While this works, you’re probably hitting a wall with retention, storage costs, or query performance. This is where customizing index templates and Index Lifecycle Management (ILM) policies becomes crucial.

Let’s see APM data in action, but not just raw logs. We’re looking at transaction traces, showing the flow of a request through your services. Here, a single user request (trace ID a1b2c3d4e5f6g7h8) initiated a web request, which then called a database query.

{
  "@timestamp": "2023-10-27T10:30:00.123Z",
  "trace": {
    "id": "a1b2c3d4e5f6g7h8",
    "stack_trace": [
      {
        "id": "0a1b2c3d4e5f6g7h",
        "parent_id": null,
        "name": "GET /api/users",
        "type": "request",
        "timestamp": "2023-10-27T10:30:00.123Z",
        "duration_ms": 150
      },
      {
        "id": "1b2c3d4e5f6g7h8i",
        "parent_id": "0a1b2c3d4e5f6g7h",
        "name": "SELECT * FROM users WHERE id = ?",
        "type": "db",
        "timestamp": "2023-10-27T10:30:00.130Z",
        "duration_ms": 50
      }
    ]
  },
  "service": {
    "name": "user-api",
    "version": "1.2.0"
  },
  "url": {
    "original": "/api/users",
    "path": "/api/users",
    "scheme": "http",
    "host": "localhost",
    "port": 8080
  },
  "user": {
    "id": "user123"
  }
}

This single JSON document represents a piece of an APM trace. The trace.id links it to other spans within the same request. trace.stack_trace shows individual operations, like the HTTP request and the database query, with their durations. This granular data is what APM uses to build those performance visualizations.

The problem APM data solves is observability into distributed systems. Before APM, debugging a slow request that hopped between microservices was a nightmare. You’d have logs scattered everywhere, and correlating them was manual and error-prone. APM centralizes this, providing a unified view of request flows, errors, and performance bottlenecks.

Internally, APM agents send data to the APM Server, which then indexes it into Elasticsearch. The APM Server is configured to use specific index templates. These templates define the mappings (data types for fields) and settings for the APM indices. Without customization, you get the default mappings, which might not be optimal for your specific use cases or Elasticsearch version.

The levers you control are primarily two things:

  1. Index Templates: These are loaded into Elasticsearch before any data is written to a new index. They define the schema (mappings) and settings for indices that match a specific pattern (e.g., apm-*). You can customize field types, add custom fields, and optimize settings like the number of shards.
  2. Index Lifecycle Management (ILM) Policies: These automate the management of your indices over time. You define phases (Hot, Warm, Cold, Delete) and actions within each phase. This is how you control data retention, move data to cheaper storage tiers, and eventually delete old data.

Here’s how you’d typically set up a custom index template for APM data. First, you need to know the index pattern APM Server uses. By default, it’s apm-<version>-*. You’d create a template that targets this.

PUT _index_template/apm_custom_template
{
  "index_patterns": ["apm-*-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "trace": {
          "properties": {
            "id": { "type": "keyword" },
            "stack_trace": {
              "properties": {
                "id": { "type": "keyword" },
                "parent_id": { "type": "keyword" },
                "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
                "type": { "type": "keyword" },
                "timestamp": { "type": "date" },
                "duration_ms": { "type": "long" }
              }
            }
          }
        },
        "service": {
          "properties": {
            "name": { "type": "keyword" },
            "version": { "type": "keyword" }
          }
        },
        "url": {
          "properties": {
            "original": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 1024 } } },
            "path": { "type": "keyword" },
            "scheme": { "type": "keyword" },
            "host": { "type": "keyword" },
            "port": { "type": "integer" }
          }
        },
        "user": {
          "properties": {
            "id": { "type": "keyword" }
          }
        }
      }
    }
  }
}

This template ensures trace.id, service.name, and url.path are mapped as keyword for exact matching and aggregations, which is far more efficient than text for these fields. duration_ms is mapped as long for numerical operations. We’ve also set the initial number of shards and replicas.

Next, you define an ILM policy. This policy will be applied to the indices created by the APM Server.

PUT _ilm/policy/apm_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_primary_shard_size": "50gb"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": {
            "priority": 1
          },
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

This policy rolls over indices after 7 days or when they reach 50GB, moves them to a warm phase where they are shrunk and force-merged to save space, then freezes them in a cold phase after 30 days, and finally deletes them after 90 days.

Now, you need to tell the APM Server to use this ILM policy and the custom index template. You do this in the APM Server configuration file (apm-server.yml).

# apm-server.yml
output.elasticsearch:
  hosts: ["localhost:9200"]
  index_template_name: apm_custom_template # Use your custom template name
  index_template_enabled: true
  # ... other elasticsearch settings

setup.template.settings:
  index.lifecycle_management.policy_name: apm_policy # Apply your ILM policy

After restarting the APM Server, new indices will be created with your custom mappings and managed by the ILM policy. The index_template_enabled: true setting in apm-server.yml is crucial for the APM Server to register the template itself. The setup.template.settings section tells the APM Server to apply the ILM policy to the indices it creates.

The one thing that often trips people up is the interplay between index template application and ILM. The index template defines the structure of an index, while ILM defines its lifecycle. The APM Server creates indices based on the apm-<version>-* pattern. If you have a custom template matching this pattern, it’s applied. If you then configure the APM Server to use an ILM policy, that policy is attached to the index settings. Without the setup.template.settings in apm-server.yml pointing to your ILM policy, the APM Server will create indices, but they won’t be managed by ILM, leading to uncontrolled growth.

Once you have ILM policies in place, the next logical step is optimizing your APM data for faster querying, perhaps by exploring data tiers or custom ingest pipelines.

Want structured learning?

Take the full Elastic-apm course →