Elastic APM’s ability to scale its data storage hinges on its clever adoption of Elastic’s Data Streams.
Here’s how it looks in practice, let’s say we have an APM service collecting data:
PUT /my-apm-data-000001
{
"mappings": {
"properties": {
"agent": {
"properties": {
"name": { "type": "keyword" },
"version": { "type": "keyword" }
}
},
"service": {
"properties": {
"name": { "type": "keyword" },
"environment": { "type": "keyword" }
}
},
"timestamp": { "type": "date" },
"trace": {
"properties": {
"id": { "type": "keyword" }
}
},
"transaction": {
"properties": {
"name": { "type": "keyword" },
"type": { "type": "keyword" },
"duration": { "type": "float" }
}
}
}
}
}
When the APM Server receives data, it doesn’t just dump it into a single, ever-growing index. Instead, it writes to an index template that defines the structure of APM data. This template is configured to create data streams. For example, if you have a data stream named my-apm-data, Elastic will automatically create indices like my-apm-data-000001, my-apm-data-000002, and so on, based on time. When you query my-apm-data, Elasticsearch automatically queries all the underlying indices that belong to that data stream. This abstracts away the complexity of managing individual indices.
The core problem this solves is the performance degradation and management overhead associated with extremely large, monolithic indices. As an APM system ingests millions or billions of documents daily, a single index can become unwieldy. Performance for indexing and searching plummets, and operations like index rollover, shrinking, or deletion become extremely slow and risky. Data streams provide a time-based partitioning strategy that keeps individual indices smaller and more manageable.
Internally, when you write data to a data stream, Elasticsearch appends it to the current write index. This current index is the one that new documents are being actively written to. When certain conditions are met (like a size limit or a time duration, typically managed by Index Lifecycle Management policies), the current write index is "rolled over." This means it becomes a read-only index, and a new index is created as the next write index. This continuous rollover process ensures that individual indices don’t grow indefinitely. The data stream, from the perspective of a user or the APM Server, remains a single logical entity.
The primary lever you control here is the configuration of your data stream’s index template. This includes defining the mappings (the structure of your data), the index_patterns which tells Elasticsearch which indices belong to this data stream (e.g., apm-*-*), and crucially, the rollover criteria. These criteria are usually managed via an Index Lifecycle Management (ILM) policy attached to the index template. For APM, this often means rolling over indices based on age (e.g., daily or weekly) and/or size (e.g., 50GB).
PUT _index_template/apm_template
{
"index_patterns": ["apm-*"],
"data_stream": {},
"template": {
"settings": {
"index.lifecycle.name": "apm_ilm_policy"
},
"mappings": {
"properties": {
"agent": {
"properties": {
"name": { "type": "keyword" },
"version": { "type": "keyword" }
}
},
"service": {
"properties": {
"name": { "type": "keyword" },
"environment": { "type": "keyword" }
}
},
"timestamp": { "type": "date" },
"trace": {
"properties": {
"id": { "type": "keyword" }
}
},
"transaction": {
"properties": {
"name": { "type": "keyword" },
"type": { "type": "keyword" },
"duration": { "type": "float" }
}
}
}
}
}
}
PUT _ilm/policy/apm_ilm_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "7d",
"max_primary_shard_size": "50gb"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
The truly elegant part of data streams is how they abstract the underlying index management. You interact with a single, logical stream name (e.g., apm-*-*) for both writing and querying, and Elasticsearch seamlessly handles the creation, rollover, and eventual deletion of the physical indices that constitute that stream. This means the APM Server doesn’t need to know about apm-2023-10-27-000001 vs. apm-2023-10-28-000002; it just writes to apm-*, and Elasticsearch directs it to the correct, active index.
When you’re troubleshooting performance or storage issues, the key is to look at the ILM policy applied to your APM data stream’s index template. Understanding the max_age and max_primary_shard_size for the rollover action, and the min_age for deletion, directly dictates how frequently new indices are created and how long data is retained.
The next step in managing this data will involve optimizing the search performance across these time-series indices.