containerd’s performance as a container runtime is often evaluated using standard benchmarking tools, but the most surprising thing is how drastically different workloads can expose wildly divergent performance characteristics.
Let’s see containerd in action. Imagine we have a simple Nginx container we want to benchmark.
# Pull the latest Nginx image
ctr images pull docker.io/library/nginx:latest
# Run a single Nginx container
ctr run --rm docker.io/library/nginx:latest nginx-test
# In another terminal, simulate some load
hey -n 10000 -c 100 http://localhost:80/
This basic setup allows us to observe raw throughput. But to truly understand containerd’s performance, we need to dive deeper into its architecture and how it interacts with the underlying operating system and storage.
containerd acts as a daemon that manages the container lifecycle. It doesn’t directly run containers itself; instead, it delegates this task to a container runtime, typically runc (a low-level OCI runtime). When you run a container, containerd:
- Receives the request: Via its API (e.g., gRPC).
- Retrieves the image: Pulls layers from a registry if not already present.
- Creates the container: Prepares the OCI runtime configuration, including namespaces, cgroups, and mounts.
- Starts the container: Invokes the OCI runtime (e.g.,
runc run) to create and start the container process. - Manages its lifecycle: Handles stop, kill, and delete requests.
The performance you observe is a combination of containerd’s overhead, the OCI runtime’s overhead, and the actual application performance within the container.
The exact levers you control are primarily through containerd’s configuration and the parameters you pass when creating containers. For instance, the choice of storage driver (overlayfs, aufs, etc.) has a significant impact on I/O performance.
# Example: containerd config.toml snippet
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Using SystemdCgroup = true leverages systemd for cgroup management, which can offer different performance profiles compared to the default cgroupfs. Another critical area is networking. containerd can integrate with various network plugins (CNI), and the efficiency of these plugins directly affects network throughput and latency.
When benchmarking, it’s crucial to isolate variables. Are you measuring startup time, steady-state throughput, or resource utilization under load? Each requires a different approach and tool. For startup, tools like time ctr run ... are useful. For steady-state, hey, wrk, or ab are common. For resource utilization, docker stats (which uses containerd under the hood) or direct cgroup monitoring (/sys/fs/cgroup/...) provides insight.
Consider the impact of storage drivers on I/O-bound workloads. OverlayFS, while widely used for its efficiency in layering, can sometimes exhibit performance regressions compared to simpler drivers for certain read-heavy operations if not configured optimally. The specific kernel version and filesystem on the host can also play a non-trivial role.
The most overlooked aspect of containerd performance tuning is often related to its internal snapshotting mechanism and how it interacts with the chosen storage driver. While containerd aims for efficient layer sharing, the actual creation and deletion of container snapshots involve significant I/O operations. If your workload involves frequent container churn (creation/deletion), the performance of overlayfs or btrfs (if used as a storage driver) becomes paramount. Specifically, the xattr (extended attributes) usage and the underlying filesystem’s ability to handle them efficiently can become a bottleneck. For example, overlayfs relies heavily on xattrs to track upper and lower directories, and on filesystems that are slow to process xattrs, this can lead to noticeable delays during container startup and shutdown.
The next concept you’ll likely explore is how to integrate custom CNI plugins for advanced networking scenarios.