containerd’s Live Restore feature is designed to keep your containers running even when the containerd daemon itself restarts, which is a huge win for application availability.

Here’s containerd running a simple nginx container, and we’ll simulate a daemon restart to see Live Restore in action.

# Start an nginx container
ctr run --rm docker.io/library/nginx:latest nginx-test

# Verify it's running
ctr tasks ls
# CONTAINER    ID       STATUS
# nginx-test   ...      RUNNING

# Check nginx is responding (in another terminal)
curl localhost:80
# <!DOCTYPE html>...

Now, let’s enable Live Restore. This is typically done in the containerd configuration file, usually located at /etc/containerd/config.toml. We need to find the [task] section and set options.reconcile to true.

# /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri"]
  # ... other CRI configurations ...

[task]
  # ... other task configurations ...
  [task.options]
    # Set reconcile to true to enable Live Restore
    reconcile = true

After modifying the config file, we need to restart the containerd service for the changes to take effect.

sudo systemctl restart containerd

Let’s check if our nginx-test container is still running.

ctr tasks ls
# CONTAINER    ID       STATUS
# nginx-test   ...      RUNNING

And, crucially, check if nginx is still responding.

curl localhost:80
# <!DOCTYPE html>...

As you can see, the container remained running through the containerd restart.

The magic behind Live Restore is containerd’s reconciliation loop. When containerd starts up, it queries the operating system’s kernel for processes that it thinks should be running containers. It does this by looking for processes that were created and managed by containerd. If it finds a process matching a container it was previously managing, and that container is still in a running state, containerd will re-establish its control over that container. This means it doesn’t have to stop and restart the container; it simply re-attaches to the existing, running process.

The key configuration option is reconcile = true within the [task.options] section of config.toml. Without this set to true, containerd would start up, but it wouldn’t actively look for and re-adopt pre-existing, running containers. It would only manage containers that are started after it has come back online.

The reconcile option essentially tells containerd: "When you start, go find any containers that were running before you stopped, and make sure they are still running and that you’re managing them." This is crucial for high availability because it minimizes downtime during daemon restarts, which can happen due to upgrades, crashes, or manual maintenance.

One aspect that often trips people up is the assumption that Live Restore magically preserves the container’s state if the host reboots. Live Restore specifically addresses containerd daemon restarts. If the entire host operating system reboots, the container processes are terminated by the OS. For containers to survive host reboots, you typically need to leverage features like systemd service dependencies or Kubernetes restartPolicy: Always in conjunction with a container runtime that supports graceful shutdown and startup, which containerd does. Live Restore ensures that when containerd comes back, it can re-establish control over containers that were running before the containerd restart, not necessarily before a host reboot.

The next thing you’ll likely encounter is managing container restarts after an unhandled crash of the containerd daemon, where the reconciliation might not have a chance to fully complete.

Want structured learning?

Take the full Containerd course →