containerd itself doesn’t directly expose metrics in a Prometheus-friendly format. Instead, you need to leverage the cri-metrics plugin or a separate metrics collector that can query the CRI (Container Runtime Interface) metrics endpoint.
Let’s see containerd in action, collecting metrics. Imagine you have a simple Go application running in a container.
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
http.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello from container!")
})
fmt.Println("Server started on :8080")
http.ListenAndServe(":8080", nil)
}
We’ll build this into a Docker image and run it. Then, we’ll configure containerd to expose metrics.
The core problem containerd metrics solve is providing visibility into the resource consumption and operational status of your containers from the runtime’s perspective. This goes beyond what kubelet might report and offers a deeper look into how containerd is managing the containers, their I/O, CPU, memory, and network usage. It’s crucial for understanding performance bottlenecks and resource contention that might not be immediately obvious at the pod or node level.
Here’s how containerd exposes these metrics: it implements the CRI, which includes a metrics endpoint. By default, this endpoint might not be enabled or accessible in a way Prometheus can scrape. The cri-metrics plugin, often integrated into containerd installations, or a dedicated collector like cadvisor (though less common for direct containerd CRI metrics) or custom CRI metric exporters, poll this endpoint. This endpoint provides raw performance counters and statistics for each container managed by containerd.
To get Prometheus scraping these, you typically need to:
-
Enable the CRI Metrics Endpoint: This is usually done in the
containerdconfiguration file, typically located at/etc/containerd/config.toml. You need to ensure theplugins.cri.metricssection is configured.[plugins."io.containerd.grpc.v1.cri"] enable_metrics = true # Optional: specify a listen address if you don't want the default # metrics_listen_address = "127.0.0.1:9090"Applying this change requires restarting
containerd:sudo systemctl restart containerd. -
Expose Metrics for Scraping:
containerditself doesn’t run a Prometheus exporter by default. You’ll often use a sidecar or a separate service that queries the CRI metrics endpoint and then exposes them in Prometheus format. A common pattern is to have a deployment that runscadvisoror a similar agent that can introspect the container runtime and expose metrics. However, for direct CRI metrics, a dedicated CRI metrics exporter is more precise.Let’s assume you’re using a setup where
containerd’s CRI metrics are exposed onlocalhost:9090(or wherever you configured it). Your Prometheus configuration would then include a scrape job:scrape_configs: - job_name: 'containerd_cri' static_configs: - targets: ['localhost:9090'] # Or the IP/port where metrics are exposed labels: instance: 'my-containerd-node'This tells Prometheus to periodically fetch metrics from the specified address.
-
Verify Metrics: Once configured, you can check the
/metricsendpoint (e.g.,curl http://localhost:9090/metrics) to see the raw output. You should see metrics likecontainer_cpu_usage_seconds_total,container_memory_working_set_bytes, etc., for each container.
The most surprising true thing about containerd metrics is that the raw CRI metrics are often cumulative counters. This means container_cpu_usage_seconds_total increases indefinitely. To get a rate (like CPU seconds per second), Prometheus needs to apply a rate() or irate() function in your queries. Forgetting this is a common pitfall, leading to queries that show ever-increasing values instead of actual usage rates.
The next concept you’ll likely encounter is how to aggregate these per-container metrics to node-level or pod-level views, often involving kube_pod_container_info from kube-state-metrics to join container IDs with Kubernetes pod information.