A Dockerfile’s primary job is to build an image, but the real trick is building an image that’s small enough to ship and fast enough to iterate on.

Let’s see how docker build actually works when it’s creating an image. Imagine we have a simple Dockerfile:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y --no-install-recommends some-package
COPY app.py /app/app.py
CMD ["python", "/app/app.py"]

When you run docker build ., Docker doesn’t just execute these commands in a vacuum. Each RUN, COPY, ADD, etc., creates a new layer on top of the previous one. Docker caches these layers. If you rebuild the image and a layer’s instruction hasn’t changed, Docker reuses the cached layer instead of re-executing the command. This is the secret sauce for faster builds.

Now, consider the RUN apt-get update && apt-get install -y --no-install-recommends some-package line. This creates one layer for the update and install. If you later add another RUN command that installs another package, it creates a new layer. Even if the apt-get update hasn’t changed, Docker will re-run it because it’s a separate instruction. This is inefficient and bloats your image.

The most surprising truth about optimizing Dockerfiles is that combining RUN commands is often more about reducing image size than speed. Each RUN command, even if it just cleans up, creates a new layer. If you run apt-get update in one layer and then apt-get install in another, the apt-get update’s downloaded package lists still exist in the first layer, even if they’re not used by the final image. Subsequent layers can’t "see" or remove files from previous layers.

Let’s build a more optimized example. This Dockerfile aims for smaller size and faster builds:

# Use a slim base image
FROM python:3.9-slim-buster

# Set working directory
WORKDIR /app

# Install dependencies in one go, cleaning up afterwards
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        some-package \
        another-package \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Copy application code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Expose port and set command
EXPOSE 8000
CMD ["python", "app.py"]

Here’s how to break down the optimization:

  1. Choose a Slim Base Image: FROM python:3.9-slim-buster instead of python:3.9. Slim variants strip out non-essential packages and documentation, significantly reducing the initial layer size.

  2. Combine RUN Commands for Package Management:

    • Diagnosis: Run docker history <image_id> to see the layers and their sizes. If you see multiple RUN apt-get update commands or separate apt-get install and apt-get clean commands, you’re likely wasting space.
    • Fix: Combine apt-get update, apt-get install, and cleanup into a single RUN instruction using &&.
      RUN apt-get update && \
          apt-get install -y --no-install-recommends \
              some-package \
          && apt-get clean \
          && rm -rf /var/lib/apt/lists/*
      
    • Why it works: This ensures that the downloaded package lists from apt-get update are removed within the same layer as the installation. If they were in a separate layer, they would still occupy disk space in the image’s history. --no-install-recommends prevents unnecessary dependencies from being pulled in. apt-get clean removes downloaded package files (.deb) and rm -rf /var/lib/apt/lists/* purges the package lists cached by apt.
  3. Leverage pip --no-cache-dir:

    • Diagnosis: Check the image size after installing Python dependencies. If it’s larger than expected and you’re not explicitly cleaning pip cache, that’s a clue.
    • Fix: Use pip install --no-cache-dir -r requirements.txt.
    • Why it works: By default, pip caches downloaded wheels. --no-cache-dir tells pip to discard the cache after installation, preventing those cached files from being part of the image layer.
  4. Order COPY and RUN for Caching:

    • Diagnosis: Rebuild your image after making a small code change. If the entire RUN pip install layer has to be re-executed, your build is slow.
    • Fix: Copy requirements.txt and run pip install before copying the rest of your application code.
      COPY requirements.txt .
      RUN pip install --no-cache-dir -r requirements.txt
      
      COPY . .
      
    • Why it works: Docker caches layers. If requirements.txt hasn’t changed, Docker will use the cached layer from pip install. Only if requirements.txt does change will the pip install layer be re-executed. Copying the entire application code first would invalidate the pip install layer even for minor code tweaks.
  5. Minimize Layers by Chaining Commands:

    • Diagnosis: Again, docker history is your friend. Look for many small RUN commands.
    • Fix: Chain related commands with && into a single RUN instruction.
      RUN echo "Setting up environment" && \
          mkdir -p /data && \
          chown appuser:appuser /data
      
    • Why it works: Each RUN command creates a new layer. Combining them reduces the total number of layers, which can slightly improve build performance and reduce the overhead of managing many layers, though the primary benefit is often size reduction by ensuring cleanup happens in the same layer.
  6. Use .dockerignore:

    • Diagnosis: Your build context (the files sent to the Docker daemon) is huge, or your COPY . . step is unexpectedly slow.
    • Fix: Create a .dockerignore file in the same directory as your Dockerfile and list files/directories to exclude (e.g., .git, node_modules, *.pyc, *.log).
      .git
      .gitignore
      *.pyc
      __pycache__
      logs/
      Dockerfile
      .dockerignore
      
    • Why it works: The .dockerignore file prevents unnecessary files from being sent to the Docker daemon during the build context. This speeds up the initial COPY operation and prevents sensitive or build-specific files from ending up in your image.

The most common pitfall when optimizing is forgetting that RUN commands execute in separate layers, and intermediate files or downloads are not automatically cleaned up by subsequent commands. You must explicitly clean up within the same RUN instruction that created the artifacts.

The next thing you’ll run into is optimizing multi-stage builds for even smaller production images.

Want structured learning?

Take the full Containers & Kubernetes course →