etcd’s storage quota is surprisingly easy to hit, and when it does, it can bring down your entire Kubernetes cluster because etcd is the single source of truth for all cluster state.

Let’s see etcd in action, specifically how it handles a large number of small objects. Imagine we’re creating a lot of dummy ConfigMap objects in Kubernetes. Each ConfigMap, no matter how small its data field, consumes a small but definite amount of space in etcd.

# Create a simple ConfigMap
kubectl create configmap my-config --from-literal=key1=value1

# Repeat this many times to simulate load
for i in {1..10000}; do
  kubectl create configmap "test-config-$i" --from-literal="key=$i"
done

As these objects are created, etcd stores them as key-value pairs. The keys are structured paths (e.g., /registry/configmaps/default/my-config) and the values are the serialized JSON representation of the Kubernetes object. etcd uses the Raft consensus algorithm, and every write operation is logged and replicated. This logging, combined with the actual data, is what consumes storage.

The problem arises because etcd, by default, doesn’t automatically clean up old data aggressively enough for many high-churn Kubernetes workloads. While Kubernetes garbage collection (GC) does delete objects, etcd itself has a history of operations and potentially orphaned data that can accumulate if not managed. This is compounded by etcd’s compaction mechanism, which is crucial for managing its storage but can be misconfigured or insufficient.

The primary lever you control is etcd’s quota-backend-bytes setting. This is the maximum size, in bytes, that etcd will allow its key-value store to grow to. When this limit is reached, etcd will refuse all write operations, effectively halting any further changes to your Kubernetes cluster.

Here’s how to diagnose and fix it.

First, check the current etcd quota and usage. You’ll typically find this in your etcd configuration file (often /etc/kubernetes/manifests/etcd.yaml for static pods, or a systemd unit file). Look for the --quota-backend-bytes flag. If it’s not set, it’s using a default, which is often too low for production clusters.

To see current usage, you can use etcdctl. Ensure you have your etcd endpoints and certificates configured correctly.

# Example using etcdctl v3
ETCDCTL_API=3 etcdctl --endpoints=https://etcd-0.etcd.local:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint status -w table

This will show you the health of your etcd cluster. To check the actual storage usage, you can use the defrag command. While defrag’s primary purpose is to reclaim space by compacting and defragmenting the Raft log and KV store, it also reports the current size.

# Run defrag to get usage info and potentially reclaim space
ETCDCTL_API=3 etcdctl --endpoints=https://etcd-0.etcd.local:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  defrag

The output will include a line like Finished defragmenting etcd. Total keys: 12345678, current size: 2.1 GB, total size: 2.5 GB. The current size is what you’re interested in relative to your quota.

Common Cause 1: Insufficient Quota The most frequent culprit is simply that the quota-backend-bytes is set too low. A common starting point for production is 8GB or 16GB.

  • Diagnosis: Check the etcd manifest/configuration for --quota-backend-bytes. If it’s unset or set to a value like 2GB, this is likely your issue.
  • Fix: Increase the --quota-backend-bytes value. For example, to set it to 8GB:
    # In your etcd static pod manifest (e.g., /etc/kubernetes/manifests/etcd.yaml)
    spec:
      containers:
      - name: etcd
        image: registry.k8s.io/etcd:3.5.9-0
        command:
        - etcd
        - --advertise-client-urls=https://10.0.0.5:2379
        - --listen-client-urls=https://10.0.0.5:2379,http://127.0.0.1:2379
        - --initial-advertise-peer-urls=https://10.0.0.5:2380
        - --listen-peer-urls=https://10.0.0.5:2380
        - --name=etcd-0
        - --data-dir=/var/lib/etcd
        - --client-cert-auth=true
        - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        - --cert-file=/etc/kubernetes/pki/etcd/server.crt
        - --key-file=/etc/kubernetes/pki/etcd/server.key
        - --peer-client-cert-auth=true
        - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        - --peer-cert-file=/etc/kubernetes/pki/etcd/server.crt
        - --peer-key-file=/etc/kubernetes/pki/etcd/server.key
        - --initial-cluster-state=existing
        - --enable-v2=false
        - --auto-compaction-retention=8 # hours
        - --quota-backend-bytes=8589934592 # 8GB in bytes
    
    Apply this change to all etcd members. The etcd pods will restart.
  • Why it works: This directly increases the allowed storage limit for etcd’s key-value store, preventing it from hitting the ceiling and refusing writes.

Common Cause 2: Inadequate Compaction Retention etcd uses a process called compaction to discard historical revisions of keys that are no longer needed. The auto-compaction-retention flag tells etcd how many hours of history to keep. If this is set too high, or if compaction isn’t running effectively, old data won’t be pruned.

  • Diagnosis: Check the etcd manifest for --auto-compaction-retention. If it’s unset or set to a very high value (e.g., 24 or more), or if you see current size much larger than total size in defrag output (indicating fragmentation), this is a problem.
  • Fix: Set --auto-compaction-retention to a reasonable value, like 1 or 8 hours. For example:
    # In etcd manifest
    - --auto-compaction-retention=8 # hours
    
    After changing this, you might need to manually trigger compaction if the cluster is already large.
    ETCDCTL_API=3 etcdctl --endpoints=... compact 100000 # Compact to a high revision number
    ETCDCTL_API=3 etcdctl --endpoints=... defrag
    
    The compact command removes revisions older than the specified revision number. The defrag command then reclaims the physical space.
  • Why it works: This ensures that etcd actively discards old, unneeded data revisions, reducing the overall storage footprint.

Common Cause 3: High Churn Rate of Objects Some Kubernetes workloads, especially those involving frequent creation/deletion of small objects (like many Pods, Events, or ConfigMaps), can generate a lot of write traffic and quickly fill up etcd, even with a reasonable quota.

  • Diagnosis: Monitor etcd’s write operations and the rate of object creation/deletion in your cluster. Look for patterns where storage usage spikes after certain application deployments or events. Tools like Prometheus with etcd_server_leader_changes_seen_total and etcd_server_proposals_failed_count can be indicative.
  • Fix:
    1. Increase Quota: As described above, this is the first line of defense.
    2. Optimize Applications: Reduce unnecessary object creation/deletion. For example, can ConfigMaps be updated instead of recreated? Are there too many short-lived Pods?
    3. Use Persistent Volumes for Data: Avoid storing large amounts of data in etcd-backed Kubernetes objects like ConfigMap or Secret if they are frequently updated or large.
    4. Kubernetes GC Tuning: While not directly an etcd setting, ensure Kubernetes garbage collection is working correctly. For object types that are garbage collected, their deletion should eventually propagate to etcd.
  • Why it works: This addresses the root cause by either making etcd larger or reducing the demand on its storage.

Common Cause 4: Fragmentation Over time, even with compaction, etcd’s underlying storage can become fragmented, meaning that the physical space occupied by the data is larger than the logical size of the data.

  • Diagnosis: The defrag command’s output can indicate fragmentation if current size is significantly larger than what you’d expect from the number of keys and their logical size.
  • Fix: Periodically run etcdctl defrag. This command rewrites the etcd data directory, compacting and removing unused space.
    ETCDCTL_API=3 etcdctl --endpoints=... defrag
    
    This operation can be resource-intensive and should ideally be performed during a maintenance window or when etcd load is low.
  • Why it works: Defragmentation physically reorganizes the data on disk, reclaiming unused blocks and consolidating the storage, making it more efficient.

Common Cause 5: Excessive etcd Wal Directory Size The Write-Ahead Log (WAL) directory stores the Raft log entries. While auto-compaction-retention affects the key-value store, the WAL directory can also grow large if not managed. etcd automatically cleans up old WAL files after they have been committed to the snapshot.

  • Diagnosis: Check the size of the etcd data directory, specifically looking for a subdirectory named wal. If this directory is excessively large (many gigabytes), it could be an indicator of an issue.
  • Fix: This is usually handled automatically by etcd’s internal processes once compaction and snapshots are working correctly. If it persists, ensure auto-compaction-retention is set and that etcd is able to create snapshots. Restarting etcd can sometimes help clear out old WAL segments if they are no longer needed.
  • Why it works: Ensures that old, no longer necessary Raft log entries are physically removed from disk, freeing up space.

Common Cause 6: Large Objects in etcd While less common for typical Kubernetes objects like ConfigMaps or Secrets, custom resources or certain controller-generated objects can sometimes be very large. etcd has a max-request-bytes limit, but the total storage can still be exhausted by many large objects.

  • Diagnosis: If you suspect specific objects are large, you can fetch them and inspect their size.
    kubectl get configmap my-large-config -o yaml | wc -c
    
    (Replace configmap and my-large-config as needed for other object types).
  • Fix: Identify and optimize or remove these large objects. This might involve redesigning custom resources or the controllers that manage them.
  • Why it works: Reduces the individual contribution of large objects to the total storage consumption.

Once you’ve adjusted your quota and compaction settings, monitor etcd’s storage usage closely. The next error you’ll likely encounter if you don’t address these issues is a cluster-wide inability to schedule new pods or update existing resources, often accompanied by etcdserver: mvcc: database space exceeded or similar errors in etcd logs.

Want structured learning?

Take the full Etcd course →