The biggest surprise about Elasticsearch’s hot-warm-cold storage tiers isn’t that it saves money, but that it allows you to increase data retention for your APM traces while simultaneously lowering your storage costs.
Let’s see this in action. Imagine you’re running Elastic APM, collecting traces from your applications. By default, all this data lands on "hot" nodes – your fastest, most expensive storage. This is great for active searching and analysis, but it’s a money pit for older data.
Here’s a typical APM index pattern, say apm-*. When this index is young, it lives on hot nodes.
PUT _template/apm_hot_template
{
"index_patterns": ["apm-*"],
"priority": 100,
"template": {
"settings": {
"index.routing.allocation.require.data_tier": "hot"
}
}
}
This template forces any index matching apm-* to be allocated only on nodes with the hot attribute. Your active traces are lightning fast to query, but your disk usage balloons.
Now, let’s introduce the "warm" tier. This tier is for data that’s accessed less frequently but still needs to be searchable. Think data from the last week or two. We’ll create a new node attribute, warm, and configure a policy to move data there.
First, label your warm nodes:
# On your warm nodes, in elasticsearch.yml
node.attr.data_tier: "warm"
Then, create a rollover and phase management policy. Rollover automatically creates a new index when the current one reaches a certain size or age, ensuring your active data stays manageable on hot nodes. Phase management moves older indices to different tiers.
PUT _ilm/policy/apm_ilm_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "7d",
"max_primary_shard_size": "50gb"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"set_priority": {
"priority": 50
},
"shrink": {
"number_of_shards": 1
},
"allocate": {
"number_of_replicas": 0,
"require": {
"data_tier": "warm"
}
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": {
"priority": 0
},
"freeze": {},
"allocate": {
"require": {
"data_tier": "cold"
}
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
With this policy, indices younger than 7 days stay on hot. At 7 days, they transition to warm. Here, we shrink them (reducing shard count for efficiency) and allocate them to nodes with data_tier: "warm". We also set number_of_replicas to 0 because warm data is less critical for availability and we want to save space.
Finally, apply this policy to your APM indices:
PUT _index_template/apm_ilm_template
{
"index_patterns": ["apm-*"],
"priority": 200,
"template": {
"settings": {
"index.lifecycle.name": "apm_ilm_policy",
"index.lifecycle.rollover_alias": "apm-write"
}
}
}
This template tells Elasticsearch to manage all apm-* indices with apm_ilm_policy. The rollover_alias ensures new data is written to the apm-write alias, which always points to the current, active index.
The cold tier is for data you might need to access rarely, but still want to keep for compliance or historical analysis. Data here is frozen, meaning it’s loaded into memory only when actively queried, dramatically reducing resource consumption. For example, data older than 30 days moves to cold, where it’s frozen.
The crucial part of the warm phase is the shrink action. When an index transitions from hot to warm, its primary shards are shrunk to a single primary shard per original shard. This significantly reduces overhead and disk space, especially for indices that might have been sharded aggressively on the hot tier. It’s not just about moving data; it’s about optimizing its stored representation for slower access.
This tiered approach allows you to keep months, even years, of APM data online without breaking the bank, because the vast majority of your historical data will reside on cheaper, slower storage in a highly optimized, frozen state.
The next hurdle you’ll face is optimizing search performance across these different tiers, especially when querying data that spans hot, warm, and cold indices.