etcd storage is designed to be a tiny, highly consistent, distributed key-value store that powers Kubernetes, and its limits are far tighter than you’d expect for a database.
Imagine you’re building a Kubernetes cluster. etcd is the brain, storing the state of every pod, service, and configuration. It needs to be super reliable, which means it keeps a history of changes (revisions) to ensure consistency. This history, however, is what eats up storage.
Here’s how etcd’s storage works and what causes it to fill up:
The Problem: Revision History and Bloat
etcd stores every change as a new revision. While this is great for consistency and features like watch notifications, it means the database grows over time. If old revisions aren’t cleaned up, etcd will eventually hit its storage limit, leading to cluster instability and component failures.
Common Causes and Solutions
-
Excessive Object Creation/Deletion: Creating and deleting a large number of Kubernetes objects rapidly (e.g., during a deployment rollout or a misconfigured controller) generates a lot of revisions.
- Diagnosis: Monitor etcd’s
mvcc_db_total_size_in_bytesmetric. You can also useetcdctl endpoint-status --write-out=tableto see the revision numbers. A rapidly increasing revision number coupled with highmvcc_db_total_size_in_bytesindicates this. - Fix: Identify the source of the rapid object churn. This might involve debugging application controllers, reviewing deployment strategies, or adjusting resource quotas. For immediate relief, you can trigger a defragmentation.
- Why it works: Defragmentation rewrites the etcd database, removing unused space occupied by old revisions.
etcdctl defrag
- Diagnosis: Monitor etcd’s
-
Large Kubernetes Objects: Storing large amounts of data within Kubernetes objects themselves (e.g., very long
ConfigMaporSecretvalues, or largekubectl logscommands that get stored) can bloat etcd.- Diagnosis: Use
etcdctl get --prefix "" --keys-only | etcdctl --write-out=table --command 'echo COUNT: $(wc -l) KEY_BYTES: $(cut -d " " -f 2 | paste -sd+ | bc) VALUE_BYTES: $(cut -d " " -f 3 | paste -sd+ | bc)'to inspect the size of keys and values. Look for unusually large entries. - Fix: Refactor large
ConfigMapsorSecretsinto smaller, more manageable units or externalize them to dedicated storage solutions. Avoid storing large binary data in etcd-backed objects. - Why it works: etcd is optimized for small, frequently changing key-value pairs, not large blobs of data. Reducing the size of individual objects directly reduces the database footprint.
- Diagnosis: Use
-
Leaked Resources (Un-deleted Objects): If objects are created but never deleted, their revisions will persist indefinitely. This is common with custom resources or poorly managed application deployments.
- Diagnosis: Use
kubectl get <resource_type> --all-namespaces -o json | jq '.items | length'to count objects and compare against expected numbers. Look for specific resource types that seem to be growing indefinitely. - Fix: Implement proper cleanup mechanisms for custom resources or application-specific objects. Regularly audit and delete orphaned or unnecessary resources.
- Why it works: Ensures that the lifecycle of all Kubernetes objects is managed, preventing unbounded growth of etcd’s revision history.
- Diagnosis: Use
-
Ineffective Compaction and Defragmentation: etcd has a
compactionprocess that removes old revisions and adefragmentationprocess that reclaims disk space. If these aren’t running or are misconfigured, etcd will grow.- Diagnosis: Check the etcd server logs for
compactionanddefragmentationoperations. You can also query etcd metrics likeetcd_server_compact_revision_totalandetcd_server_defragmentation_request_total. - Fix: Ensure etcd’s auto-compaction is enabled and set to a reasonable revision number (e.g.,
etcdctl compact <revision_number> --auto-compact). Schedule regularetcdctl defragoperations. For managed Kubernetes services, this is often handled automatically. - Why it works: Compaction removes the logical entries for old revisions, and defragmentation physically reclaims the disk space they occupied, making room for new data.
- Diagnosis: Check the etcd server logs for
-
etcd Version Issues: Older versions of etcd might have less efficient storage management or known bugs related to disk usage.
- Diagnosis: Check your etcd version using
etcdctl version. Compare it against the latest stable release recommended for your Kubernetes version. - Fix: Upgrade etcd to a more recent, stable version. Follow Kubernetes documentation for safe etcd upgrades.
- Why it works: Newer versions often include performance improvements and bug fixes that can address storage bloat issues.
- Diagnosis: Check your etcd version using
-
Network Issues Leading to Retries: If etcd members can’t communicate reliably, they might retry operations, leading to duplicate or excessive entries and revisions.
- Diagnosis: Monitor etcd network metrics (e.g.,
etcd_network_peer_round_trip_time_seconds) and check etcd server logs for network-related errors or repeated requests. - Fix: Ensure network connectivity between etcd nodes is stable and low-latency. Resolve any firewall, routing, or DNS issues.
- Why it works: Reliable communication prevents redundant operations and ensures the consistency of the etcd cluster, reducing unnecessary writes and revisions.
- Diagnosis: Monitor etcd network metrics (e.g.,
The Next Problem You’ll Hit
After resolving storage limits, you’ll likely encounter etcdserver: mvcc: database space exceeded errors if the underlying disk is full, or etcdserver: request timed out if etcd is struggling to keep up due to excessive disk I/O.