containerd’s garbage collection is often misunderstood as a simple "clean up my disk" button, but it’s actually a sophisticated system designed to balance performance, data integrity, and resource utilization.

Let’s see it in action. Imagine a busy Kubernetes cluster. Pods are constantly being deployed and scaled down. Each deployment pulls down container images, and each scaled-down pod leaves behind its associated image layers and container artifacts. Without effective garbage collection, this disk space would quickly fill up.

Here’s a simplified view of how containerd manages this. When a container exits, its resources (like its filesystem layer) are marked as potentially garbage. When an image is no longer referenced by any running or stopped container, or by any active image pull, it’s also marked. The garbage collection process then reclaims the disk space occupied by these unreferenced artifacts.

The primary configuration for containerd’s garbage collection lives in /etc/containerd/config.toml. You’ll find a section like this:

[gc]
  disabled = false
  retention_policy = { retention_type = "by-time", retention_interval = "168h" }
  gc_interval = "12h"

Let’s break down the key parameters:

  • disabled: Setting this to true will completely disable garbage collection. This is almost never what you want unless you have a very specific, manual disk management strategy.
  • retention_policy: This defines what should be kept.
    • retention_type: Can be "by-time" (keep artifacts for a certain duration) or "by-count" (keep a certain number of unreferenced artifacts).
    • retention_interval: If using "by-time", this is the duration (e.g., "168h" for 7 days) for which unreferenced image layers and container artifacts will be kept. After this period, they become eligible for actual deletion.
  • gc_interval: This is how often containerd checks if there’s anything to garbage collect. It doesn’t mean it deletes everything every gc_interval. It means it runs the garbage collection process at this frequency.

Consider a scenario where you have a rapid deployment/rollback cycle. A new version of an application is deployed, pulling down new image layers. Shortly after, the old version is rolled back, and its specific layers are no longer referenced. If your retention_interval is set to "168h", those layers will remain on disk for up to 7 days, even though they are no longer needed by any active deployment.

The gc_interval determines how often containerd looks for these candidates. If gc_interval is "12h" and retention_interval is "168h", containerd will scan every 12 hours for artifacts that have been unreferenced for at least 168 hours, and then delete them.

Here’s how you might tune it. If you’re constantly running out of disk space due to transient image layers from CI/CD pipelines or frequent application updates, you might want to aggressively prune.

Example Tuning:

Let’s say you want to aggressively reclaim disk space, keeping unreferenced layers for only 24 hours and running the GC check every 6 hours.

You would modify your config.toml to:

[gc]
  disabled = false
  retention_policy = { retention_type = "by-time", retention_interval = "24h" } # Keep unreferenced for 24 hours
  gc_interval = "6h" # Check for garbage every 6 hours

After making these changes, you need to restart the containerd service:

sudo systemctl restart containerd

The impact of retention_interval is that it sets a floor on how quickly space is reclaimed. A shorter interval means space is freed up sooner. The gc_interval affects how promptly that freeing happens after the retention_interval has passed. A shorter gc_interval means containerd will scan and clean more frequently.

However, there’s a crucial detail often overlooked: the retention_policy applies to both image layers and container artifacts (like logs and writable layers of stopped containers). If you have a very short retention_interval and a fast-moving deployment process, you might accidentally delete image layers that are still needed by other images that share those layers. This is because containerd’s garbage collection is layer-based. If an image layer is unreferenced by any image or container, it’s eligible for deletion if it meets the retention policy.

The true power of containerd’s garbage collection lies in its ability to differentiate between what is actively in use and what is merely referenced but not currently running. It doesn’t just delete everything; it intelligently identifies truly orphaned data.

When you’re troubleshooting disk space issues, don’t just look at the total disk usage. Use commands like sudo ctr storage usage to see a breakdown of where space is being consumed by containerd’s storage driver. This will show you usage by image, container, and snapshot.

The next challenge you’ll likely face is managing the disk space consumed by container logs, which are often configured to grow indefinitely without proper rotation.

Want structured learning?

Take the full Containerd course →