Elasticsearch can actually modify the data it returns after it’s been queried, letting you add new "fields" that weren’t there originally.
Let’s see this in action. Imagine you have an index of web server logs, and each document looks something like this:
{
"timestamp": "2023-10-27T10:30:00Z",
"client_ip": "192.168.1.10",
"request": "/api/users",
"status_code": 200,
"response_time_ms": 150
}
You want to analyze the response_time_ms but group it by whether the request was "slow" or "fast." You could reindex your data, but that’s a pain. With runtime fields, you can do it on the fly.
Here’s a query that adds a response_time_category field:
GET /logs/_search
{
"runtime_mappings": {
"response_time_category": {
"type": "keyword",
"script": {
"source": """
if (doc['response_time_ms'].value > 200) {
emit('slow');
} else {
emit('fast');
}
"""
}
}
},
"aggs": {
"response_time_buckets": {
"terms": {
"field": "response_time_category"
}
}
},
"size": 0
}
When you run this, Elasticsearch will execute the script for every document that matches the query before it performs the aggregation. The emit() function is how you output the value for the new runtime field. The output will show you a count of "slow" and "fast" requests without you having to change your original data.
This is powerful because it decouples your data storage from your analysis needs. You don’t need to predict every possible derived field you might ever want. If you need to categorize requests by, say, the day of the week, or extract a user ID from a complex header string, you can just add a new runtime field to your query. It’s like having a mini-ETL process happen after the data is already indexed and during the search.
The core idea is that Elasticsearch maintains a schema-on-read for these runtime fields, as opposed to the schema-on-write you’re used to. When you define a runtime field, you specify its type (like keyword, long, double, ip, etc.) and a script. This script, written in Painless (Elasticsearch’s scripting language), dictates how to compute the field’s value for each document. The script has access to the document’s existing fields via doc['field_name'].value.
The size: 0 in the example above is crucial. It tells Elasticsearch you’re only interested in the aggregations and not the actual search hits, making the query more efficient when you’re just doing analysis.
Here’s another example: extracting the top-level domain from a client_ip field, which is often stored as an ip type.
GET /logs/_search
{
"runtime_mappings": {
"client_tld": {
"type": "keyword",
"script": {
"source": "emit(doc['client_ip'].value.substring(doc['client_ip'].value.lastIndexOf('.') + 1))"
}
}
},
"aggs": {
"tld_counts": {
"terms": {
"field": "client_tld"
}
}
},
"size": 0
}
This query adds a client_tld field, extracts the last part of the IP address (which isn’t the TLD but demonstrates string manipulation on an IP type), and then counts the occurrences of each resulting string. The ip type in Elasticsearch allows string-like operations on IP addresses.
One thing that trips people up is that runtime fields are computed per-document during the search. This means if you have a massive index and a complex script, your query performance can degrade significantly because Elasticsearch has to do extra work for every single document it considers. Unlike indexed fields, which are pre-computed and stored, runtime fields are computed on the fly. This is why they are fantastic for exploration and for fields you don’t query very often, but less ideal for high-cardinality fields that you need to aggregate on constantly across millions of documents.
The next step after exploring with runtime fields is often to materialize them into your index if they prove to be consistently useful and performant.