BuildKit, the next-gen builder for Docker, can now spew its guts out via OpenTelemetry, giving you unprecedented insight into what’s actually happening during your builds.
Let’s watch a build happen and see the metrics flow. Imagine you have a simple Dockerfile and you’re building it with BuildKit enabled.
# Dockerfile
FROM alpine:latest
RUN echo "Hello, BuildKit!" > /hello.txt
You’d start your Docker daemon with BuildKit enabled, likely via the features flag in daemon.json:
{
"features": {
"buildkit": true
}
}
Then, you’d build your image:
DOCKER_BUILDKIT=1 docker build -t my-alpine-app .
Now, to export metrics, we need to tell BuildKit where to send them. This is done via environment variables when starting the BuildKit daemon, or if you’re running BuildKit directly, you’d pass these flags. For a typical Docker setup, you’d configure your daemon.json to point to an OpenTelemetry collector.
Let’s assume you have an OpenTelemetry Collector running. You’d configure BuildKit to export to it. This is often done by setting BUILDKIT_OPENTELEMETRY_ENDPOINT when BuildKit starts. If you’re managing BuildKit through Docker, this might involve a daemon.json configuration that points to your collector’s OTLP endpoint.
Here’s a snippet of what your OpenTelemetry Collector configuration might look like, specifically for receiving OTLP metrics from BuildKit:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
exporters:
logging:
loglevel: debug
# Add your preferred exporter here (e.g., Prometheus, Jaeger, etc.)
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging]
And you would start your collector with this configuration:
docker run -p 4317:4317 -p 55679:55679 \
-v $(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector:latest \
--config /etc/otel-collector-config.yaml
Once the collector is running and BuildKit is configured to send metrics to its 0.0.0.0:4317 (the default OTLP gRPC port), you’ll start seeing BuildKit metrics appearing in your collector’s logs (if you’re using the logging exporter) or being forwarded to your chosen backend.
The core problem BuildKit’s OpenTelemetry integration solves is observability during the build process. Before, you had docker build logs, which are linear and often opaque. Now, you get structured, time-series data about everything: the duration of each build step, the amount of data transferred, cache hits and misses, the performance of specific BuildKit solver operations, and even details about the underlying network requests.
Internally, BuildKit exposes metrics through its core.Log interface, which can be hooked into an OpenTelemetry SDK. When a metric event occurs (e.g., a build step starts, finishes, or a cache miss happens), BuildKit emits a structured log entry. The OpenTelemetry exporter then translates these log entries into OpenTelemetry metric data points (like counters, gauges, and histograms) and sends them over OTLP.
You control what you see by configuring what BuildKit emits and how your OpenTelemetry collector processes and exports it. You can filter, aggregate, and sample metrics at the collector level. For instance, you might only want to export metrics for specific build stages or for builds originating from certain repositories.
The most surprising thing about BuildKit’s OpenTelemetry metrics is the granularity of performance data available for solver operations. BuildKit’s solver is the brain that figures out the optimal build plan. You can now see histograms of how long different solver strategies take, revealing bottlenecks you’d never have guessed. For example, you might see that a specific RUN command consistently triggers a slow solver phase because it involves complex dependency resolution, even if the RUN command itself executes quickly.
The next challenge you’ll encounter is correlating these build metrics with the performance of your deployed applications, creating an end-to-end view of your CI/CD pipeline’s impact.