The Elasticsearch fielddata cache is a memory hog, and if you don’t manage it, it’ll bring your cluster to its knees with OutOfMemory errors.
Here’s how it breaks down in practice. Imagine you’re running an e-commerce analytics dashboard. You’ve got millions of product documents, and you want to aggregate by product_category and brand. By default, Elasticsearch loads the data for these fields into memory for fast aggregations. If you have a lot of unique values in product_category or brand, or if the fields themselves are large (e.g., long text fields), this fielddata can balloon.
PUT /my_products
{
"mappings": {
"properties": {
"product_category": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"description": {
"type": "text"
}
}
}
}
POST /my_products/_doc
{
"product_category": "Electronics",
"brand": "Sony",
"description": "A high-definition television with 4K resolution."
}
POST /my_products/_doc
{
"product_category": "Electronics",
"brand": "Samsung",
"description": "A smart refrigerator with a large touchscreen."
}
POST /my_products/_doc
{
"product_category": "Home Appliances",
"brand": "LG",
"description": "An energy-efficient washing machine."
}
Now, if you run an aggregation like this:
GET /my_products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "product_category.keyword"
}
},
"brands": {
"terms": {
"field": "brand.keyword"
}
}
}
}
Elasticsearch will load the product_category.keyword and brand.keyword fields into memory on each shard that contains data for these fields. If you have many shards, many unique values, or large fields, this memory footprint can exceed the JVM heap allocated to Elasticsearch, leading to an OOM.
The problem this solves is enabling fast aggregations and sorting on non-indexed fields (or fields indexed in a way that doesn’t support them directly, like text fields without a .keyword sub-field). Elasticsearch’s default indexing for text fields is optimized for full-text search, not for exact value matching or aggregation. To perform aggregations on such fields, it needs to load the data into a different, in-memory structure: fielddata.
Internally, fielddata is a compressed, on-heap representation of the terms in a field. For each shard, it stores the unique terms and their positions within the documents on that shard. This allows for very rapid lookups during aggregations and sorting. However, the "on-heap" part is the critical vulnerability.
The primary lever you control is the indices.fielddata.cache.size setting. This setting dictates the maximum percentage of the JVM heap that fielddata is allowed to consume. The default is typically unbounded, meaning it will try to use as much as it needs, which is often the source of OOMs.
Here’s how you’d set a limit:
PUT _cluster/settings
{
"persistent": {
"indices.fielddata.cache.size": "60%"
}
}
This tells Elasticsearch that fielddata should not consume more than 60% of the JVM heap on any given node. If it reaches this limit, Elasticsearch will start evicting older or less-used fielddata entries to make room for new ones. This can slow down aggregations that require data that was just evicted, but it prevents the OOM.
Crucially, the fielddata cache is per-node, not per-shard. This means that if you have multiple nodes, each node manages its own fielddata cache, and the indices.fielddata.cache.size applies to the heap of that individual node.
Another crucial control is disabling fielddata entirely for fields where you know you’ll never need it. You do this in your mapping:
PUT /my_products/_mapping
{
"properties": {
"description": {
"type": "text",
"fielddata": false
}
}
}
By setting fielddata: false on a text field like description, you explicitly tell Elasticsearch not to load fielddata for it, saving memory even if someone accidentally tries to aggregate on it. This is a powerful proactive measure.
If you’re experiencing OOMs, the first thing to check is the fielddata usage on your nodes. You can see this using the Nodes Stats API:
GET _nodes/stats/indices/fielddata
Look for the bytes_used metric. If this is consistently high and approaching your heap size, you’re a prime candidate for OOMs.
If you’ve set indices.fielddata.cache.size and are still having issues, it might be that your fields are simply too large or too numerous to fit within the allocated cache. In such cases, consider alternative approaches:
- Use
doc_valuesinstead offielddata: For most use cases where you need aggregations or sorting,doc_values(enabled by default forkeyword,numeric,date, andbooleantypes) are far more memory-efficient. They store data columnarly on disk rather than in heap memory. If your field can be mapped askeyword, switch to that. - Reduce the cardinality of your fields: If you have a field with millions of unique values that you’re aggregating on, consider if you can pre-process or group these values into a smaller set of categories.
- Increase JVM heap size: While not always the best solution, if your workload legitimately requires more memory for fielddata, you might need to allocate more heap to your Elasticsearch nodes. Ensure you follow the JVM heap sizing recommendations (e.g., no more than 50% of system RAM, and not exceeding 30-32GB).
The common mistake is to assume fielddata is always available and forget about its memory implications. When you set indices.fielddata.cache.size to a specific percentage, Elasticsearch will apply that limit per node. If one node hits the limit, it will evict entries. However, if all nodes are trying to load fielddata for a massive field simultaneously, and the total fielddata across all shards exceeds the sum of the per-node limits, you can still get OOMs. This is why aggressive use of doc_values and mapping keyword fields is preferred.
The next problem you’ll likely encounter after managing fielddata is understanding how the query cache and request cache interact with it, and how to tune those for optimal performance.