A Dockerfile’s primary job is to build an image, but the real trick is building an image that’s small enough to ship and fast enough to iterate on.
Let’s see how docker build actually works when it’s creating an image. Imagine we have a simple Dockerfile:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y --no-install-recommends some-package
COPY app.py /app/app.py
CMD ["python", "/app/app.py"]
When you run docker build ., Docker doesn’t just execute these commands in a vacuum. Each RUN, COPY, ADD, etc., creates a new layer on top of the previous one. Docker caches these layers. If you rebuild the image and a layer’s instruction hasn’t changed, Docker reuses the cached layer instead of re-executing the command. This is the secret sauce for faster builds.
Now, consider the RUN apt-get update && apt-get install -y --no-install-recommends some-package line. This creates one layer for the update and install. If you later add another RUN command that installs another package, it creates a new layer. Even if the apt-get update hasn’t changed, Docker will re-run it because it’s a separate instruction. This is inefficient and bloats your image.
The most surprising truth about optimizing Dockerfiles is that combining RUN commands is often more about reducing image size than speed. Each RUN command, even if it just cleans up, creates a new layer. If you run apt-get update in one layer and then apt-get install in another, the apt-get update’s downloaded package lists still exist in the first layer, even if they’re not used by the final image. Subsequent layers can’t "see" or remove files from previous layers.
Let’s build a more optimized example. This Dockerfile aims for smaller size and faster builds:
# Use a slim base image
FROM python:3.9-slim-buster
# Set working directory
WORKDIR /app
# Install dependencies in one go, cleaning up afterwards
RUN apt-get update && \
apt-get install -y --no-install-recommends \
some-package \
another-package \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Copy application code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Expose port and set command
EXPOSE 8000
CMD ["python", "app.py"]
Here’s how to break down the optimization:
-
Choose a Slim Base Image:
FROM python:3.9-slim-busterinstead ofpython:3.9. Slim variants strip out non-essential packages and documentation, significantly reducing the initial layer size. -
Combine
RUNCommands for Package Management:- Diagnosis: Run
docker history <image_id>to see the layers and their sizes. If you see multipleRUN apt-get updatecommands or separateapt-get installandapt-get cleancommands, you’re likely wasting space. - Fix: Combine
apt-get update,apt-get install, and cleanup into a singleRUNinstruction using&&.RUN apt-get update && \ apt-get install -y --no-install-recommends \ some-package \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* - Why it works: This ensures that the downloaded package lists from
apt-get updateare removed within the same layer as the installation. If they were in a separate layer, they would still occupy disk space in the image’s history.--no-install-recommendsprevents unnecessary dependencies from being pulled in.apt-get cleanremoves downloaded package files (.deb) andrm -rf /var/lib/apt/lists/*purges the package lists cached byapt.
- Diagnosis: Run
-
Leverage
pip --no-cache-dir:- Diagnosis: Check the image size after installing Python dependencies. If it’s larger than expected and you’re not explicitly cleaning
pipcache, that’s a clue. - Fix: Use
pip install --no-cache-dir -r requirements.txt. - Why it works: By default,
pipcaches downloaded wheels.--no-cache-dirtellspipto discard the cache after installation, preventing those cached files from being part of the image layer.
- Diagnosis: Check the image size after installing Python dependencies. If it’s larger than expected and you’re not explicitly cleaning
-
Order
COPYandRUNfor Caching:- Diagnosis: Rebuild your image after making a small code change. If the entire
RUN pip installlayer has to be re-executed, your build is slow. - Fix: Copy
requirements.txtand runpip installbefore copying the rest of your application code.COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . - Why it works: Docker caches layers. If
requirements.txthasn’t changed, Docker will use the cached layer frompip install. Only ifrequirements.txtdoes change will thepip installlayer be re-executed. Copying the entire application code first would invalidate thepip installlayer even for minor code tweaks.
- Diagnosis: Rebuild your image after making a small code change. If the entire
-
Minimize Layers by Chaining Commands:
- Diagnosis: Again,
docker historyis your friend. Look for many smallRUNcommands. - Fix: Chain related commands with
&&into a singleRUNinstruction.RUN echo "Setting up environment" && \ mkdir -p /data && \ chown appuser:appuser /data - Why it works: Each
RUNcommand creates a new layer. Combining them reduces the total number of layers, which can slightly improve build performance and reduce the overhead of managing many layers, though the primary benefit is often size reduction by ensuring cleanup happens in the same layer.
- Diagnosis: Again,
-
Use
.dockerignore:- Diagnosis: Your build context (the files sent to the Docker daemon) is huge, or your
COPY . .step is unexpectedly slow. - Fix: Create a
.dockerignorefile in the same directory as yourDockerfileand list files/directories to exclude (e.g.,.git,node_modules,*.pyc,*.log)..git .gitignore *.pyc __pycache__ logs/ Dockerfile .dockerignore - Why it works: The
.dockerignorefile prevents unnecessary files from being sent to the Docker daemon during the build context. This speeds up the initialCOPYoperation and prevents sensitive or build-specific files from ending up in your image.
- Diagnosis: Your build context (the files sent to the Docker daemon) is huge, or your
The most common pitfall when optimizing is forgetting that RUN commands execute in separate layers, and intermediate files or downloads are not automatically cleaned up by subsequent commands. You must explicitly clean up within the same RUN instruction that created the artifacts.
The next thing you’ll run into is optimizing multi-stage builds for even smaller production images.