containerd doesn’t actually set CPU and memory limits directly; it enforces them based on the configuration you provide to the container runtime.
Let’s see containerd in action with some limits. Imagine you have a simple nginx container. Normally, it would happily consume whatever CPU and memory resources it could grab. But we want to rein it in.
First, we need to configure the container’s runtime. This is typically done via a config.toml file for containerd, often found at /etc/containerd/config.toml.
Here’s a snippet of what that might look like, focusing on the plugins."io.containerd.grpc.v1.cri" section, which is what Kubernetes uses:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
The SystemdCgroup = true is crucial. It tells containerd to use systemd’s cgroup management, which is how resource limits are typically applied.
Now, when you create a container (say, via ctr or Kubernetes), you specify these limits. For ctr, it looks something like this:
ctr run \
--gpus 0 \
--limits cpu=0.5,memory=256MB \
docker.io/library/nginx:latest \
nginx-limited
Here, we’re telling ctr to limit the nginx container to 0.5 CPU cores and 256 megabytes of memory.
What’s actually happening under the hood? When containerd creates a container, it interacts with the underlying OCI runtime (like runc). runc, in turn, uses Linux kernel features called control groups (cgroups) to enforce these limits.
For CPU, cpu.shares and cpu.cfs_quota_us/cpu.cfs_period_us are the key parameters. Setting cpu=0.5 on a system with 100,000 CPU shares (a common default) would effectively reduce the container’s shares to 50,000. Or, more directly, it might set cpu.cfs_period_us to 100000 and cpu.cfs_quota_us to 50000, meaning the container can only use 50ms of CPU time every 100ms.
For memory, it’s primarily memory.limit_in_bytes. So, memory=256MB translates directly to setting this value to 268435456 bytes. If the container tries to exceed this, the kernel’s Out-Of-Memory (OOM) killer will step in, and the process within the container will be terminated.
The SystemdCgroup = true setting means containerd is asking systemd to manage these cgroup settings, often creating unit files for each container’s cgroup. This provides a more structured and robust way to manage resources, especially in environments where systemd is the primary init system.
The surprising thing is how granularly you can control this. You don’t just set a hard limit; you can also set requests and limits separately. Requests are what the container asks for and is guaranteed, while limits are the hard ceiling. Kubernetes uses these to schedule pods and then translates them into the cgroup settings that containerd enforces.
If you’re using Kubernetes, these limits are specified in the Pod’s YAML:
apiVersion: v1
kind: Pod
metadata:
name: nginx-limited-pod
spec:
containers:
- name: nginx
image: nginx:latest
resources:
limits:
cpu: "500m"
memory: "128Mi"
requests:
cpu: "250m"
memory: "64Mi"
Here, 500m is 0.5 CPU cores, and 128Mi is 128 mebibytes. Kubernetes tells containerd (via the CRI interface) what these values are, and containerd configures the underlying cgroups.
The next thing you’ll run into is understanding how these limits interact with scheduling and how they affect application performance when the limits are too low.