Painless scripts let you inject arbitrary logic into Elasticsearch during indexing or querying, but they don’t actually run code in the traditional sense; they compile to bytecode that’s executed within a sandboxed JVM, and their primary power lies in their ability to manipulate document fields based on specific conditions.
Let’s see this in action. Imagine you have a product catalog and want to create a discounted_price field that’s 10% off the original_price if the product is marked as "on_sale".
Here’s how you’d add that during indexing using an ingest pipeline:
PUT _ingest/pipeline/add_discounted_price
{
"description": "Adds a discounted_price field if the product is on sale",
"processors": [
{
"script": {
"lang": "painless",
"source": """
if (ctx.containsKey('on_sale') && ctx.on_sale == true) {
if (ctx.containsKey('original_price')) {
ctx.discounted_price = ctx.original_price * 0.9;
}
}
""",
"if": "ctx.containsKey('original_price')"
}
}
]
}
Now, when you index a document with an original_price and on_sale set to true, the discounted_price will be automatically calculated and added.
POST my-index/_doc?pipeline=add_discounted_price
{
"name": "Super Widget",
"original_price": 100.00,
"on_sale": true
}
This results in a document like:
{
"name": "Super Widget",
"original_price": 100.00,
"on_sale": true,
"discounted_price": 90.00
}
If on_sale were false or original_price were missing, discounted_price wouldn’t be added.
The core problem Painless scripts solve is the need for dynamic, context-aware field manipulation that goes beyond simple mappings. You might need to:
- Derive new fields: Like
discounted_pricefromoriginal_priceand a boolean flag. - Normalize data: Convert units, standardize formats, or clean up inconsistent values.
- Enrich data: Add calculated fields based on combinations of existing ones.
- Implement complex routing: Dynamically determine shard routing based on document content.
- Custom scoring: Influence relevance scores during searches.
Internally, Painless is a strongly typed, Java-like language designed for safety and performance within the Elasticsearch JVM. It enforces strict security rules, preventing direct access to arbitrary Java APIs and limiting operations to what’s necessary for document processing. When a script is executed, Elasticsearch compiles the Painless code into bytecode that runs in a highly optimized, sandboxed environment. This means you get the flexibility of custom logic without compromising the stability or security of your cluster. The ctx object is your gateway to the current document being processed, allowing you to read existing fields and write new ones.
The if condition at the processor level ("if": "ctx.containsKey('original_price')" in the example above) is a crucial performance optimization. It ensures the script only runs if the specified condition is met, preventing unnecessary execution and potential errors on documents that don’t have the required fields. This is distinct from if statements within the source script, which control logic flow inside the script itself.
When writing scripts, especially for complex calculations or conditional logic, you’ll often find yourself needing to handle missing fields gracefully. A common pattern is to use ctx.getOrDefault('field_name', defaultValue) to provide a fallback value if the field doesn’t exist, preventing NullPointerExceptions and ensuring your script continues to execute predictably. For instance, to calculate a weighted score where a missing weight field defaults to 1.0, you’d write ctx.score = ctx.base_score * ctx.getOrDefault('weight', 1.0);.
The next step after mastering ingest pipelines for field creation is exploring Painless scripts within search requests for dynamic aggregation or conditional highlighting.