The Elasticsearch translog’s fsync settings are a delicate balancing act between data durability and write performance, and getting it wrong can dramatically impact your indexing speed.
Here’s how it looks in action. Imagine you have a high-throughput indexing workload. Without proper tuning, you’ll see indexing latency spike, and your cluster might struggle to keep up, leading to dropped requests or delayed visibility of your data.
PUT /my-index/_settings
{
"index": {
"translog": {
"sync_interval": "5s",
"durability": "async"
}
}
}
The problem Elasticsearch solves here is ensuring that even if a node crashes, you don’t lose data that’s already been acknowledged by the cluster. When you index a document, Elasticsearch writes it to memory (the buffer) and also to the transaction log (translog) on disk. The translog is a write-ahead log. Before a document is considered "committed" (meaning it’s visible for searching and durable), the translog entry for that operation must be safely written to disk.
The key players are index.translog.sync_interval and index.translog.durability.
index.translog.sync_interval: This setting controls how often Elasticsearch forces the translog data from the operating system’s file buffer to the physical disk. It’s a time-based interval. The default is5s.index.translog.durability: This dictates whether translog writes are synchronous or asynchronous.request: Every translog write is flushed to disk before returning success to the client. This is the safest but slowest option.async: Elasticsearch flushes the translog to disk periodically in the background, based onsync_interval. It returns success to the client before the data is guaranteed to be on disk. This is much faster but carries a small risk of data loss if the node crashes before the flush happens.
Internally, Elasticsearch uses the operating system’s file system cache. When you write to the translog, the data goes into this cache. An fsync operation (which is what sync_interval triggers) is a system call that tells the OS to physically write the cached data to the disk.
The default settings (sync_interval: 5s, durability: request) offer strong durability but can be a bottleneck. For every single document indexed, Elasticsearch waits for the OS to confirm the data is on disk. If your disk is slow or your indexing rate is very high, this synchronous wait becomes a significant performance limiter.
To improve write performance, you can tune these settings. The most common and impactful change is to switch durability to async. This immediately removes the per-request disk flush wait. Then, you can adjust sync_interval. A shorter interval means more frequent flushes to disk, increasing durability but potentially impacting performance slightly more. A longer interval means less frequent flushes, improving performance but increasing the potential window for data loss in a crash.
For many high-write scenarios, setting durability to async and sync_interval to something like 10s or 30s provides a good balance. If you’re indexing millions of documents and can tolerate losing a few seconds of data in the event of a crash, you might even push sync_interval to 60s.
PUT /my-index/_settings
{
"index": {
"translog": {
"sync_interval": "30s",
"durability": "async"
}
}
}
This configuration means Elasticsearch will buffer translog writes and only force them to disk every 30 seconds. The indexing requests return much faster because they don’t wait for the physical disk write.
The counterintuitive aspect of async durability is that while it’s inherently less durable at any given microsecond, it often leads to a more stable and consistent indexing performance over time, which can be more valuable than absolute, millisecond-level durability for many applications. You’re trading a tiny, theoretical risk of losing the last few seconds of data for a significant, practical improvement in throughput and reduced latency.
If you’ve tuned sync_interval and durability and are still seeing slow writes, the next bottleneck is likely either your disk I/O subsystem or the CPU overhead of merging segments.