Docker layer caching is the unsung hero of fast CI builds, and most people are using it wrong. The surprising truth is that simply running docker build doesn’t automatically give you the best caching; you have to architect your Dockerfile to exploit it.
Let’s see it in action. Imagine a typical Node.js build.
FROM node:18-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --prefer-offline
COPY . .
RUN npm run build
When this builds, Docker looks at each instruction. If an instruction and its inputs haven’t changed since the last build, Docker reuses the cached layer from that instruction.
The COPY package.json package-lock.json ./ instruction creates a cache layer. If these files haven’t changed, the npm ci command will use its cache. The npm ci command itself is also cached. If package.json and package-lock.json are the same, the npm ci command layer is reused. This is good!
But then we have COPY . .. This copies all your application code. If any file in your project changes, this COPY instruction invalidates its cache layer. And because the COPY . . layer is now a new layer, Docker has to re-run all subsequent instructions, including npm run build. This is where the performance hit happens.
The problem is, we want to cache npm ci’s dependencies, but we don’t want to re-run npm ci every time we change a single line of application code.
The solution is to break up the build into more, smaller layers, and order them such that the layers that change most frequently come last.
Here’s a much better way to structure that Dockerfile:
FROM node:18-alpine
WORKDIR /app
# Cache dependencies first
COPY package.json package-lock.json ./
RUN npm ci --prefer-offline
# Then copy application code
COPY . .
RUN npm run build
Wait, that’s the same as before! The trick isn’t just the order of COPY and RUN, but how you structure the COPY commands.
Let’s refine it:
FROM node:18-alpine
WORKDIR /app
# Copy only package files first
COPY package.json package-lock.json ./
# Install dependencies. This layer is cached as long as package*.json don't change.
RUN npm ci --prefer-offline
# Now, copy the rest of the application code.
# This layer (and subsequent ones) will be re-built if *any* app file changes.
COPY . .
# Build the application. This layer depends on the previous COPY, so it's rebuilt if app code changes.
RUN npm run build
This is better, but still not optimal. What if you only change package.json? You still have to re-run npm ci.
The real magic comes from being granular.
FROM node:18-alpine
WORKDIR /app
# Step 1: Copy package.json and package-lock.json
COPY package.json package-lock.json ./
# Step 2: Install dependencies. This layer is cached if package*.json are unchanged.
RUN npm ci --prefer-offline
# Step 3: Copy source code (e.g., src/ folder)
COPY src/ ./src/
# Step 4: Copy other assets (e.g., public/ folder)
COPY public/ ./public/
# Step 5: Build the application. This layer is only invalidated if src/ or public/ change.
RUN npm run build
Now, if you only change a file in src/, only the COPY src/ ./src/ layer and the RUN npm run build layer are invalidated. The npm ci layer remains cached! This is a massive win in CI.
Let’s consider a Python example with dependencies in requirements.txt:
FROM python:3.10-slim
WORKDIR /app
# Copy requirements first
COPY requirements.txt ./
# Install dependencies. Cached if requirements.txt doesn't change.
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application
COPY . .
# Run the application (or build it)
CMD ["python", "app.py"]
This structure ensures that pip install is only re-run when requirements.txt changes. If you only modify your application code, Docker skips the dependency installation layer.
The key is to identify the instructions that are most stable (i.e., change least often) and place them earliest in the Dockerfile. For most applications, this means dependency installation commands (npm ci, pip install, bundle install, go mod download) should be separated from your application code COPY commands.
Consider the npm ci command again. If your package-lock.json is stable, but you add a new devDependency to package.json and then immediately change a line in your src code, the npm ci layer will be invalidated even though the actual installed modules might not have changed significantly. This is because Docker caches based on the instruction and its direct inputs. If package.json or package-lock.json changes, the RUN npm ci layer is considered new.
The real power comes when you understand that each RUN, COPY, ADD instruction creates a new layer. Docker builds the image by executing these instructions sequentially. If an instruction’s content (for COPY/ADD) or the instruction itself (for RUN) hasn’t changed since the last build, and the layers before it are identical, Docker will use the cached layer for that instruction.
This means if you have a complex build process, you might want to break it down even further. For instance, in a frontend build, you might have separate COPY commands for your src directory, your public directory, and maybe even a config directory, each followed by a build step or a step that relies on those specific files. This way, changing a file in public/ only invalidates the COPY public/ layer and subsequent layers, not the entire build.
The next hurdle is managing multi-stage builds effectively for even finer-grained caching.