Understand etcd Storage Limits and How to Stay Under Them (2026)

etcd storage is designed to be a tiny, highly consistent, distributed key-value store that powers Kubernetes, and its limits are far tighter than you’d expect for a database.

Imagine you’re building a Kubernetes cluster. etcd is the brain, storing the state of every pod, service, and configuration. It needs to be super reliable, which means it keeps a history of changes (revisions) to ensure consistency. This history, however, is what eats up storage.

Here’s how etcd’s storage works and what causes it to fill up:

The Problem: Revision History and Bloat

etcd stores every change as a new revision. While this is great for consistency and features like watch notifications, it means the database grows over time. If old revisions aren’t cleaned up, etcd will eventually hit its storage limit, leading to cluster instability and component failures.

Common Causes and Solutions

Excessive Object Creation/Deletion: Creating and deleting a large number of Kubernetes objects rapidly (e.g., during a deployment rollout or a misconfigured controller) generates a lot of revisions.
- Diagnosis: Monitor etcd’s mvcc_db_total_size_in_bytes metric. You can also use etcdctl endpoint-status --write-out=table to see the revision numbers. A rapidly increasing revision number coupled with high mvcc_db_total_size_in_bytes indicates this.
- Fix: Identify the source of the rapid object churn. This might involve debugging application controllers, reviewing deployment strategies, or adjusting resource quotas. For immediate relief, you can trigger a defragmentation.
- Why it works: Defragmentation rewrites the etcd database, removing unused space occupied by old revisions. etcdctl defrag
Large Kubernetes Objects: Storing large amounts of data within Kubernetes objects themselves (e.g., very long ConfigMap or Secret values, or large kubectl logs commands that get stored) can bloat etcd.
- Diagnosis: Use etcdctl get --prefix "" --keys-only | etcdctl --write-out=table --command 'echo COUNT: $(wc -l) KEY_BYTES: $(cut -d " " -f 2 | paste -sd+ | bc) VALUE_BYTES: $(cut -d " " -f 3 | paste -sd+ | bc)' to inspect the size of keys and values. Look for unusually large entries.
- Fix: Refactor large ConfigMaps or Secrets into smaller, more manageable units or externalize them to dedicated storage solutions. Avoid storing large binary data in etcd-backed objects.
- Why it works: etcd is optimized for small, frequently changing key-value pairs, not large blobs of data. Reducing the size of individual objects directly reduces the database footprint.
Leaked Resources (Un-deleted Objects): If objects are created but never deleted, their revisions will persist indefinitely. This is common with custom resources or poorly managed application deployments.
- Diagnosis: Use kubectl get <resource_type> --all-namespaces -o json | jq '.items | length' to count objects and compare against expected numbers. Look for specific resource types that seem to be growing indefinitely.
- Fix: Implement proper cleanup mechanisms for custom resources or application-specific objects. Regularly audit and delete orphaned or unnecessary resources.
- Why it works: Ensures that the lifecycle of all Kubernetes objects is managed, preventing unbounded growth of etcd’s revision history.
Ineffective Compaction and Defragmentation: etcd has a compaction process that removes old revisions and a defragmentation process that reclaims disk space. If these aren’t running or are misconfigured, etcd will grow.
- Diagnosis: Check the etcd server logs for compaction and defragmentation operations. You can also query etcd metrics like etcd_server_compact_revision_total and etcd_server_defragmentation_request_total.
- Fix: Ensure etcd’s auto-compaction is enabled and set to a reasonable revision number (e.g., etcdctl compact <revision_number> --auto-compact). Schedule regular etcdctl defrag operations. For managed Kubernetes services, this is often handled automatically.
- Why it works: Compaction removes the logical entries for old revisions, and defragmentation physically reclaims the disk space they occupied, making room for new data.
etcd Version Issues: Older versions of etcd might have less efficient storage management or known bugs related to disk usage.
- Diagnosis: Check your etcd version using etcdctl version. Compare it against the latest stable release recommended for your Kubernetes version.
- Fix: Upgrade etcd to a more recent, stable version. Follow Kubernetes documentation for safe etcd upgrades.
- Why it works: Newer versions often include performance improvements and bug fixes that can address storage bloat issues.
Network Issues Leading to Retries: If etcd members can’t communicate reliably, they might retry operations, leading to duplicate or excessive entries and revisions.
- Diagnosis: Monitor etcd network metrics (e.g., etcd_network_peer_round_trip_time_seconds) and check etcd server logs for network-related errors or repeated requests.
- Fix: Ensure network connectivity between etcd nodes is stable and low-latency. Resolve any firewall, routing, or DNS issues.
- Why it works: Reliable communication prevents redundant operations and ensures the consistency of the etcd cluster, reducing unnecessary writes and revisions.

The Next Problem You’ll Hit

After resolving storage limits, you’ll likely encounter etcdserver: mvcc: database space exceeded errors if the underlying disk is full, or etcdserver: request timed out if etcd is struggling to keep up due to excessive disk I/O.