Configure Elasticsearch Disk Watermarks to Prevent Read-Only Mode (2026)

Elasticsearch can lock itself into read-only mode when disk space gets too low, and that’s usually because the default disk watermarks are too conservative for your cluster’s growth.

The system has two main thresholds, the low watermark and the high watermark. When a node’s disk usage crosses the low watermark, Elasticsearch starts relocating shards away from that node to balance the load and prevent it from filling up completely. If that node’s disk usage continues to climb and crosses the high watermark, Elasticsearch will prevent new shards from being allocated to that node. If the disk usage still doesn’t decrease and reaches 100%, it triggers a cluster-wide read-only lock to prevent any further data writes, which can be a real pain to recover from.

Here’s how to tune these settings.

Common Causes for Hitting Watermarks

Actual Disk Full: The most straightforward reason is that the disk is genuinely running out of space.
- Diagnosis: SSH into the affected node and run df -h. Check the output for the filesystem where Elasticsearch data is stored.
- Fix: Free up space by deleting old indices, snapshots, or other unnecessary files. If this is a recurring issue, you’ll need to increase the disk size or add more nodes.
- Why it works: This directly addresses the root cause of the disk being full.
Conservative Default Watermarks: Elasticsearch’s default watermarks are often set at 85% for low and 90% for high. For systems with rapid data ingestion or large shard sizes, these can be hit very quickly.
- Diagnosis: Check your current cluster settings using curl -X GET "localhost:9200/_cluster/settings?pretty". Look for cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high.
- Fix: Increase the watermarks. For example, to set the low watermark to 90% and the high watermark to 95%:
```
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%"
  }
}
'
```
- Why it works: By raising the thresholds, you give your cluster more headroom before it starts aggressively relocating shards or preventing new allocations, allowing it to manage disk space more gracefully with your current data growth rate.
Uneven Shard Distribution: A single node might accumulate too many shards, causing its disk to fill up faster than others, even if the overall cluster disk usage is low.
- Diagnosis: Use the Cat Shards API: curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node,size". Sort or group by node to see which nodes have the most data.
- Fix: Ensure your indexing strategy distributes shards evenly. This might involve adjusting shard counts per index or enabling shard balancing features if available in your version. If a node is consistently overloaded, consider moving it to a larger disk or rebalancing shards manually.
- Why it works: Distributing shards more evenly prevents any single node from becoming a bottleneck and hitting its disk limits prematurely.
Large Shard Sizes: If your individual shards are very large (e.g., tens or hundreds of GB), even a moderate number of shards can consume significant disk space on a node.
- Diagnosis: Again, use curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node,size". Examine the size column for individual shards.
- Fix: Consider reducing the number of shards per index. For large time-series data, this often means increasing the granularity of your time-based indices (e.g., daily or hourly indices instead of monthly). This will create more, smaller shards, which are easier for Elasticsearch to manage and relocate.
- Why it works: Smaller shards are more portable and less impactful if a relocation is needed. It also prevents a single large shard from consuming a disproportionate amount of disk space on any given node.
Stale Indices/Data Not Being Deleted: Over time, old indices that are no longer needed can accumulate, consuming disk space.
- Diagnosis: Use the Cat Indices API: curl -X GET "localhost:9200/_cat/indices?v&h=index,creation.date.keyword,health,docs.count,store.size". Look for old indices that are no longer accessed.
- Fix: Implement an index lifecycle management (ILM) policy to automatically delete or move old indices to cheaper storage. Manually delete unneeded indices: curl -X DELETE "localhost:9200/my-old-index-2022.01.*".
- Why it works: Regularly cleaning up old data directly frees up disk space, preventing it from contributing to watermark issues.
Corrupted Shard Data: In rare cases, a shard’s data might become corrupted, reporting an incorrect size or causing allocation issues.
- Diagnosis: Check the Elasticsearch logs on the affected node for any shard-related errors or corruption warnings. The _cat/shards API might also show shards in an UNASSIGNED state or with unusual sizes.
- Fix: If a shard is corrupted, you might need to force merge it (if possible) or, as a last resort, delete the corrupted shard and restore it from a snapshot.
- Why it works: Removing or repairing corrupted data resolves the incorrect disk usage reporting and allows for proper shard management.
Disk Watermark Configuration on Individual Nodes: While cluster-wide settings are common, you can also set disk watermarks on a per-node basis. If these are set incorrectly on a specific node, it could cause issues.
- Diagnosis: Check node-specific settings: curl -X GET "localhost:9200/_nodes/stats/fs?pretty". Look for watermark.low and watermark.high under the fs section for individual nodes.
- Fix: If node-specific watermarks are present and misconfigured, remove them to fall back to cluster-level settings, or adjust them accordingly. You can remove node-specific settings by updating the cluster settings to remove the node-specific keys, or by using the transient setting to override them.
- Why it works: This ensures that node-specific overrides don’t interfere with the intended cluster-wide disk management strategy.

After adjusting your watermarks, you might see a CLUSTER_RECOVERED_FROM_READONLY event in your logs if the cluster was previously in read-only mode.