containerd spins up a minimal "pause" container, also known as a "sandbox" container, for every pod. This isn’t just a quirk; it’s fundamental to how Kubernetes manages pod networking and resource isolation.
Let’s see it in action. Imagine you have a simple pod definition:
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx-container
image: nginx:latest
When you apply this with kubectl apply -f pod.yaml, and then inspect the containers running on your node (assuming containerd is your runtime), you’ll see something like this:
$ ctr containers ls
CONTAINER IMAGE RUNTIME
a1b2c3d4e5f67890abcdef1234567890abcdef1234567890 docker.io/library/nginx:latest@sha256:c297a1093b16e7117771228f4b7939d02e07700e9053070484488a3521316c3b io.containerd.runc.v2
0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef docker.io/library/k8s.gcr.io/pause:3.9@sha256:9327431849a9656b708d91d0869d090103f9466056735a42d653178310a74f57 io.containerd.runc.v2
Notice two containers: nginx-container (your actual application) and a pause container. The pause container is the one that’s always there, even if your pod definition only specifies application containers.
The core problem the pause container solves is shared network namespace. Kubernetes wants all containers within a pod to share the same network namespace. This means they share the same IP address and port space, allowing them to communicate with each other via localhost and making it easy to expose services without port conflicts within the pod.
How does it achieve this? The pause container is created first and initialized with the pod’s network namespace. It essentially occupies that namespace. When subsequent containers are launched for the same pod, they are configured to join the existing network namespace that the pause container created. The pause container itself does nothing but sleep (hence "pause"); its sole purpose is to hold open the network namespace.
The exact levers you control are primarily through the pod’s spec.shareProcessNamespace and spec.hostNetwork fields, though these don’t directly alter the pause container’s existence. The pause container’s image is also configurable via the kubelet’s --pod-infra-container-image flag. For example, if you wanted to use a specific version of the pause image:
# Example on the kubelet configuration file (e.g., /var/lib/kubelet/config.yaml)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# ... other configurations
podInfraContainerImage: k8s.gcr.io/pause:3.7
This ensures that when the kubelet instructs containerd to create the pause container for a new pod, it uses the specified image.
The pause container is also crucial for implementing network plugins (CNIs). The CNI plugin configures the network interfaces for the pod’s network namespace, which is managed by the pause container. So, when a CNI plugin like Calico or Flannel sets up networking, it’s attaching interfaces to the network namespace that the pause container is the guardian of.
Many people assume the pause container is just a minimal application container that happens to be empty. In reality, its lifecycle is tightly coupled to the pod’s network namespace. When the last application container in a pod exits, the pause container is the last thing to be terminated. This ensures that the network namespace persists as long as any part of the pod is considered "running" from a networking perspective. The pod-status reported by the kubelet is also tied to the pause container’s presence.
The next concept you’ll likely encounter is how different network plugins interact with this pause container and the pod’s network namespace to provide pod-to-pod communication.