containerd’s default security posture is surprisingly permissive, often leaving your containers vulnerable to host compromise through unconstrained syscalls or privileged access.

Let’s see what containerd is actually doing on a running host.

# On your host machine, find a running containerd process
ps aux | grep containerd

# Then, examine its namespaces and capabilities. We'll use `nsenter` to get inside the container's PID namespace.
# First, find the PID of a containerd child process (the container runtime itself)
CONTAINERD_PID=$(ps -ef | grep containerd | grep -v grep | awk '{print $2}' | head -n 1)

# Now, find a specific container's PID (e.g., a simple busybox container)
CONTAINER_PID=$(ps -ef | grep busybox | grep -v grep | awk '{print $2}' | head -n 1)

# Enter the container's PID namespace and list its syscall filters
sudo nsenter -t $CONTAINER_PID -n strace -e trace=open,execve,socket,connect,bind,listen,accept,mount,unmount,chmod,chown,setuid,setgid,ptrace,kill,reboot,syslog,klogctl,pivot_root,chroot,uselib,personality,setns,unshare,clone,fork,vfork,execveat,capset,capget,seccomp -f

# You'll likely see a lot of syscalls allowed by default.

The primary goal of hardening containerd is to drastically reduce the attack surface by limiting what a container process can do on the host. This involves two main mechanisms: Seccomp and AppArmor.

Seccomp: The Syscall Firewall

Seccomp (Secure Computing Mode) is a Linux kernel feature that allows a process to restrict the set of system calls it can make. containerd, by default, applies a moderately restrictive Seccomp profile, but it’s far from locked down. For production, you want to apply a much stricter profile.

The Problem: A compromised container might try to escape its sandbox by making malicious system calls, like mount to access host filesystems, pivot_root to change the root directory, or unshare to create new namespaces and gain more privileges.

Diagnosis: You can inspect the default Seccomp profile containerd uses. This is typically found in the containerd configuration file, often at /etc/containerd/config.toml. Look for the seccomp_profile setting under plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options. If it’s not explicitly set, containerd uses a built-in default.

Common Causes & Fixes:

  1. Default Profile Too Permissive: The built-in default profile allows many syscalls that are not necessary for most containerized applications.

    • Diagnosis: Examine the default profile. You can often find it in the containerd source code or by inspecting a running container’s Seccomp filters (though this is complex).
    • Fix: Download a stricter, community-maintained Seccomp profile. A good starting point is the one provided by Docker or Kubernetes. For example, you can fetch the default.json profile from moby/moby:
      wget https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json -O /etc/containerd/seccomp_profile.json
      
      Then, update your containerd configuration:
      # /etc/containerd/config.toml
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true
        # Set the path to your downloaded strict profile
        SeccompProfile = "/etc/containerd/seccomp_profile.json"
      
      Restart containerd: sudo systemctl restart containerd.
    • Why it Works: This profile explicitly denies all syscalls except for a curated list deemed safe and necessary for general container operation (like read, write, openat, close, futex, etc.), drastically reducing the kernel attack surface.
  2. Application-Specific Syscall Needs: Even strict profiles might block syscalls required by your specific application (e.g., network-intensive apps needing socket or bind, or apps using ptrace for debugging).

    • Diagnosis: Run your application with the strict profile and observe containerd logs or dmesg for Seccomp denial messages. You can also temporarily enable a more verbose Seccomp logging mode if available or use strace within the container (if allowed by the profile) to see what syscalls are being attempted.
    • Fix: Manually edit your seccomp_profile.json to add the specific syscalls your application needs. For instance, to allow socket and bind:
      // Inside the "syscalls" array in your seccomp_profile.json
      {
          "names": ["socket", "bind"],
          "action": "SCMP_ACT_ALLOW"
      },
      
      Important: Be very judicious here. Only add what’s absolutely necessary. Restart containerd after modifying the profile.
    • Why it Works: You’re creating a custom, fine-grained allowlist for your application’s unique requirements, balancing security with functionality.
  3. Incorrect Profile Path: containerd can’t find or load the specified Seccomp profile.

    • Diagnosis: Check containerd logs (sudo journalctl -u containerd -f) for errors related to Seccomp loading or file access. Ensure the path specified in config.toml is correct and that the containerd user has read permissions for the file.
    • Fix: Correct the SeccompProfile path in /etc/containerd/config.toml and ensure permissions are set:
      sudo chown root:root /etc/containerd/seccomp_profile.json
      sudo chmod 644 /etc/containerd/seccomp_profile.json
      
      Then restart containerd.
    • Why it Works: containerd can now correctly locate and parse the Seccomp policy, applying the intended restrictions.

AppArmor: The Process Confinement System

AppArmor is another Linux security module that confines programs to a predetermined set of resources. It works by defining profiles for specific executables, dictating what files they can access, what network operations they can perform, and more. containerd can leverage AppArmor to further restrict the runtime environment.

The Problem: Even with Seccomp limiting syscalls, a container might still have broad access to the host filesystem, network interfaces, or sensitive host processes if its user context or the container runtime itself is compromised.

Diagnosis: Check if AppArmor is enabled on your host (sudo aa-status). Look for containerd or runc profiles in the output. You can also check the containerd configuration (/etc/containerd/config.toml) for AppArmor-related settings, often under plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options.

Common Causes & Fixes:

  1. AppArmor Not Enabled or Profiles Missing: AppArmor is disabled on the host, or the necessary containerd/runc profiles aren’t installed or loaded.

    • Diagnosis: Run sudo aa-status. If AppArmor is not running, you’ll need to enable it (OS-dependent, often involves kernel boot parameters like apparmor=1 security=apparmor). Check if profiles like containerd-runc or docker-default exist in /etc/apparmor.d/.
    • Fix: Ensure AppArmor is enabled. Install AppArmor utilities (sudo apt install apparmor apparmor-utils or sudo yum install apparmor apparmor-utils). Install or ensure profiles are present. Many distributions include them. If not, you might find them in containerd’s or Docker’s source. Load them:
      sudo apparmor_parser -r -W /etc/apparmor.d/containerd-runc
      # Or for Docker's profile if using that
      sudo apparmor_parser -r -W /etc/apparmor.d/docker-default
      
      Ensure your containerd config enables AppArmor. In config.toml:
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true
        # Ensure this is set to true to enable AppArmor
        ApparmorEnabled = true
      
      Restart containerd: sudo systemctl restart containerd.
    • Why it Works: AppArmor is now active on the host, and containerd is configured to apply its security profiles to container runtimes, confining them.
  2. Application Violating AppArmor Profile: Your application inside the container is trying to perform an action disallowed by the AppArmor profile (e.g., writing to a protected file, accessing a network port it shouldn’t).

    • Diagnosis: Check dmesg or syslog for AppArmor denial messages. These typically look like DENIED followed by the rule that was violated.
    • Fix: This is the most complex. You have two main options:
      • Adjust the AppArmor Profile: If the action is legitimate, you can use aa-logprof to learn from the denials and update the AppArmor profile in /etc/apparmor.d/. This requires understanding AppArmor syntax.
      • Use a Different Profile: If containerd is using a generic profile (like docker-default), it might be too restrictive. You can try to get a more permissive profile or, for specific workloads, create a custom AppArmor profile for your container image.
      • Disable AppArmor for Specific Containers (Last Resort): You can disable AppArmor for a specific container using the --security-opt apparmor= flag when running a container with docker run or equivalent configuration in Kubernetes/containerd’s API. For containerd directly, this might involve omitting the AppArmor profile in the container’s runtime options.
    • Why it Works: Either the AppArmor rules are updated to permit the necessary actions, or the restriction is selectively lifted for workloads that cannot be confined.
  3. Container Runtime Not Using AppArmor: containerd is configured to use AppArmor, but the underlying runtime (like runc) isn’t correctly configured or is bypassed.

    • Diagnosis: Ensure ApparmorEnabled = true in containerd’s config. Verify that the runc executable is properly installed and that containerd is calling it with the correct AppArmor flags.
    • Fix: This is usually tied to the ApparmorEnabled setting and ensuring the AppArmor profiles are correctly parsed and loaded by the kernel. If runc itself is misconfigured or not correctly communicating with the kernel’s LSM (Linux Security Module) framework, it might require reinstallation or specific kernel module loading.
    • Why it Works: containerd correctly instructs runc (or its chosen runtime) to apply the AppArmor confinement when launching the container process.

Rootless Mode: Isolating from the Host User

Rootless mode allows containerd and containers to run as a non-root user on the host. This is a significant security enhancement because it means a compromise within the container (or even containerd itself) doesn’t automatically grant root access to the host.

The Problem: By default, containerd runs as a system daemon, requiring root privileges. If the containerd process or a container it manages is compromised, the attacker gains root on the host.

Diagnosis: Check if containerd is running as root.

ps aux | grep containerd

If the USER column shows root, it’s running in rootful mode.

Common Causes & Fixes:

  1. Rootless containerd Daemon Not Running: You haven’t configured or started the rootless containerd service.

    • Diagnosis: Check if containerd is running as root. If so, the rootless daemon is not active.
    • Fix: Follow the official containerd documentation for setting up rootless mode. This typically involves:
      • Installing containerd as a regular user.
      • Configuring ~/.config/containerd/config.toml for rootless operation (e.g., setting rootless_mode = true).
      • Setting up appropriate user namespaces (subuid/subgid mappings).
      • Starting the rootless containerd service using systemd --user or similar.
      # Example systemd user service file (e.g., ~/.config/systemd/user/containerd.service)
      [Unit]
      Description=containerd rootless
      After=network.target
      
      [Service]
      ExecStart=/usr/local/bin/containerd --config ~/.config/containerd/config.toml
      Restart=always
      
      [Install]
      WantedBy=default.target
      
      Then enable and start:
      systemctl --user enable containerd.service
      systemctl --user start containerd.service
      
    • Why it Works: containerd now operates entirely within the unprivileged user’s context, significantly limiting its ability to affect the host system.
  2. Inadequate User Namespace Configuration: User namespace remapping (essential for rootless containers) is not correctly set up.

    • Diagnosis: Rootless containers will fail to start, often with errors related to user ID mapping or permissions. Check containerd logs for errors like "failed to set up user namespace" or permission denied.
    • Fix: Ensure you have sufficient entries in /etc/subuid and /etc/subgid for the user running containerd. For example, to map 65536 UIDs and GIDs starting from a high number:
      # As the user running containerd
      sudo usermod --add-subuids 100000-165535 $USER
      sudo usermod --add-subgids 100000-165535 $USER
      
      Then restart the rootless containerd service.
    • Why it Works: User namespaces allow the container to believe it’s running as root (UID 0) within its own isolated environment, while on the host, these UIDs are mapped to unprivileged user IDs, preventing true root escalation.
  3. Networking Issues in Rootless Mode: Rootless networking often relies on user-mode networking stacks like slirp4netns, which can have performance limitations or compatibility issues compared to rootful networking.

    • Diagnosis: Containers can’t reach external services, or external services can’t reach containers. Network performance is poor.
    • Fix: Ensure slirp4netns (or your chosen user-mode network stack) is installed and correctly configured. For more advanced networking, consider using pasta or podman network features if available in your rootless setup. Some advanced configurations might involve setting up VPNs or specific tunnel interfaces.
    • Why it Works: The user-mode network stack correctly bridges the container’s network namespace to the host’s network interface without requiring root privileges.

By implementing strict Seccomp profiles, leveraging AppArmor, and running containerd in rootless mode, you create a significantly more secure environment for your containerized workloads, minimizing the impact of potential vulnerabilities.

The next hurdle you’ll likely face is managing these security policies effectively across a fleet of containers and ensuring your CI/CD pipelines integrate with these hardening measures without becoming brittle.

Want structured learning?

Take the full Containerd course →