BuildKit’s COPY --from instruction doesn’t actually copy files; it creates hard links to data already present in the BuildKit cache.
BuildKit is a modern, extensible builder for Docker images. It’s designed to be faster and more efficient than the older builder. The COPY --from instruction is a powerful feature that allows you to copy artifacts from a previous build stage into your current one. This is incredibly useful for creating smaller, more secure final images by only including what’s necessary.
Let’s see it in action. Imagine you have a multi-stage build where you first compile a Go application and then copy the binary into a minimal Alpine Linux image.
# Stage 1: Build the application
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY main.go .
RUN go build -o myapp .
# Stage 2: Create the final image
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["/app/myapp"]
In this example, COPY --from=builder /app/myapp . doesn’t re-read the myapp binary from the builder stage’s filesystem. Instead, BuildKit identifies the specific cache layer containing /app/myapp from the builder stage and creates a hard link to that data in the current stage’s filesystem. This is why it’s so fast and efficient, especially for large files. It avoids duplicating data on disk.
The mental model here is that BuildKit maintains a directed acyclic graph (DAG) of build steps and their resulting data. Each layer in the cache is a node in this graph. When you use COPY --from, you’re essentially telling BuildKit to add a dependency to your current build step, referencing a specific output from a previous step. This dependency is resolved by creating a hard link, which is a filesystem-level pointer to the actual data blob.
The key levers you control are:
--fromargument: This specifies the name of the source build stage (e.g.,builderin the example) or its index (e.g.,COPY --from=0).- Source path: The path to the file or directory within the source build stage.
- Destination path: The path where the file or directory will be placed in the current build stage.
The real magic of COPY --from lies in its ability to reference any previous stage, not just the immediately preceding one. You could have a complex build with multiple intermediate stages and copy artifacts from any of them into your final image.
Consider a scenario where you need to copy a compiled binary from stage builder and a configuration file from a separate stage named configs.
# Stage 1: Build the application
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY main.go .
RUN go build -o myapp .
# Stage 2: Prepare config files
FROM alpine:latest AS configs
RUN mkdir /etc/myapp
RUN echo "some_config=value" > /etc/myapp/config.yaml
# Stage 3: Final image
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/myapp .
COPY --from=configs /etc/myapp/config.yaml /etc/myapp/config.yaml
CMD ["/app/myapp"]
BuildKit’s cache management is quite sophisticated. It doesn’t just store full filesystem layers. Instead, it stores individual data blobs (files, directories) and their metadata. When a COPY --from instruction is encountered, BuildKit looks up the specific blob referenced by the source path in the specified stage’s cache and creates a hard link. If the source path is a directory, it recursively hard-links all its contents. This is why even copying large directories can appear instantaneous.
Most people don’t realize that the source path in COPY --from doesn’t have to be a file that was explicitly created by a RUN command in the source stage. It can be any file or directory that exists in the filesystem of that source stage at the end of its execution. This includes files copied into the source stage via earlier COPY instructions or files created by package managers during RUN commands. BuildKit’s cache is granular enough to track these individual artifacts.
The next concept you’ll likely encounter is BuildKit’s support for remote caching, which allows you to share build cache layers across different machines and CI/CD pipelines.