containerd’s OverlayFS snapshotter can be a performance bottleneck if not configured correctly, leading to slow image pulls and container startup times.
Here’s how to get it singing.
First, let’s see it in action. Imagine you’ve got a busy Kubernetes cluster and you’re spinning up and down many pods. Each pod needs its container images. If those images aren’t cached locally, containerd has to pull them. If the snapshotter is slow, this pull takes ages, and your pods are stuck in ContainerCreating.
Let’s simulate pulling an image and see how the snapshotter is involved.
# On a node where containerd is running
ctr image pull docker.io/library/nginx:latest
When ctr image pull runs, containerd doesn’t just download bytes. It downloads layers, and then uses the snapshotter to create a writable layer on top of those immutable base layers. This is where OverlayFS comes in. It’s a union filesystem that stacks these layers.
The default configuration for containerd’s snapshotter is often too conservative. It prioritizes stability over raw speed, especially with its default overlay.mount.options.
The primary levers you have are in containerd’s configuration file, typically located at /etc/containerd/config.toml. We’re looking at the [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] section, and specifically the snapshotter field, which should be set to overlayfs. Then, within the [plugins."io.containerd.grpc.v1.snapshotter.overlayfs"] section, you can tweak the mount options.
The default mount options often include nodev,nosuid which are good for security but can impact performance for certain I/O patterns. The key to unlocking better performance is often by adjusting these.
Consider this snippet from your config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
snapshotter = "overlayfs" # Ensure this is set
[plugins."io.containerd.grpc.v1.snapshotter.overlayfs"]
# This is where the magic happens
mount_options = ["nodev", "nosuid", "nodefer"] # Example: adding nodefer
The nodefer option is often the biggest win. By default, OverlayFS might defer certain operations, leading to latency. nodefer tells it to be more eager. You might also experiment with removing nosuid if your security posture allows, although nodev is generally recommended to prevent device files from being accessible in containers.
Another critical factor is the underlying storage driver and filesystem. OverlayFS performs best on ext4 or xfs filesystems. If you’re running on something less performant, like an NFS mount or Btrfs without proper tuning, you’ll see significant slowdowns. Ensure your container runtime data directory (/var/lib/containerd by default) resides on a well-performing, local filesystem.
The max_concurrent_snapshot_pulls setting, found in [plugins."io.containerd.grpc.v1.cri".containerd], can also be adjusted. If you have a high-bandwidth connection and a fast disk, increasing this from the default of 5 can speed up pulling multiple images simultaneously.
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
max_concurrent_snapshot_pulls = 10 # Increase from default 5
Don’t forget to restart containerd after making changes to config.toml:
sudo systemctl restart containerd
After restarting, you should observe faster image pulls and quicker container startup times, especially under load. The difference is most noticeable when dealing with images that have many layers or when pulling many images concurrently.
A common pitfall is overlooking the interaction between OverlayFS options and the kernel version. Newer kernels often have better OverlayFS implementations. If you’re on a very old kernel, you might not see the same performance benefits from these tuning options.
The next hurdle you’ll likely encounter is optimizing the garbage collection of unused images and snapshots, which can reclaim disk space and further improve performance.