A heredoc is the only way to reliably run multi-line commands in a Dockerfile that keeps its state between lines.
Let’s see this in action. Imagine you need to install a few packages and then run a command that depends on those packages being present.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
curl \
jq \
&& rm -rf /var/lib/apt/lists/*
RUN echo "Downloading data..." \
&& curl -s "https://api.example.com/data" | jq '.some_field' > /app/data.txt
RUN echo "Processing data..." \
&& cat /app/data.txt | grep "specific_pattern"
This looks pretty standard, right? We’re using && to chain commands. But if one of these commands fails, the entire RUN instruction fails, and the build stops. More importantly, each RUN instruction creates a new layer. If apt-get install in the first RUN fails, the subsequent curl command in the next RUN instruction never even attempts to execute.
The problem becomes more apparent when you have complex setup scripts or conditional logic. Imagine needing to configure a service, but the configuration requires multiple steps, and each step needs to be in sequence.
FROM alpine:latest
RUN apk add --no-cache python3 py3-pip && \
pip install --no-cache-dir requests && \
echo "Setup complete."
RUN python3 -c "
import requests
response = requests.get('http://example.com/config')
with open('/etc/myapp/config.json', 'w') as f:
f.write(response.text)
print('Config written.')
"
Here, we’re trying to run a Python script directly within a RUN command. If the Python script itself has a syntax error or a runtime error, the RUN command will fail. The issue isn’t just the failure; it’s how Docker layers work. Each RUN is a distinct layer. If the Python script fails, you might not have a clear indication of which part of the script caused the issue without digging into the build logs of that specific layer.
Now, let’s introduce heredocs. A heredoc allows you to pass a block of text as standard input to a command. In a Dockerfile, this is typically used with commands like RUN combined with shell features. The key is that the entire block is treated as a single input to the shell.
Here’s the same example, but using a heredoc for the multi-line Python script:
FROM alpine:latest
RUN apk add --no-cache python3 py3-pip && \
pip install --no-cache-dir requests && \
echo "Setup complete."
RUN python3 -c "$(cat <<EOF
import requests
response = requests.get('http://example.com/config')
with open('/etc/myapp/config.json', 'w') as f:
f.write(response.text)
print('Config written.')
EOF
)"
Notice the $(cat <<EOF ... EOF) part. This tells the shell to:
- Start a "here document" (
<<EOF). - Read all lines until it encounters a line containing only
EOF. - Pass those lines as standard input to the
catcommand. - The output of
cat(which is the script content) is then substituted into thepython3 -ccommand.
The benefit here is twofold:
- Readability: The script is clearly delineated within the Dockerfile.
- Error Handling: If the Python script has an error, the error message will be directly tied to the execution of that script within the
RUNcommand, making debugging easier. The entire block is processed by the shell as one unit.
Let’s make it even more robust with a more complex setup. Suppose you need to create a directory, set permissions, and then place a file there.
FROM debian:bullseye-slim
RUN apt-get update && apt-get install -y sudo && rm -rf /var/lib/apt/lists/*
RUN useradd -m appuser && \
echo "appuser ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
RUN mkdir -p /app/data \
&& chown appuser:appuser /app/data \
&& echo "initial data" > /app/data/config.txt
RUN echo "Running setup script for appuser..." \
&& sudo -u appuser bash -c "$(cat <<'EOF'
echo "Creating user directory..."
mkdir -p /home/appuser/.config
echo "Setting permissions for user config..."
chown appuser:appuser /home/appuser/.config
echo "User setup complete."
EOF
)"
In this example, we’re using sudo -u appuser bash -c to execute commands as the appuser. The heredoc <<'EOF' (with the single quote before EOF) prevents shell variable expansion within the heredoc itself, which is crucial if your script contains characters like $.
The shell (bash in this case) receives the entire block of text as its standard input and executes it. Any commands within the heredoc are executed in the same shell context, meaning if you create a variable or set an environment variable inside the heredoc, it persists for subsequent commands within that same heredoc block. This is fundamentally different from separate RUN commands, where each RUN starts a fresh shell environment.
The most surprising true thing about heredocs in Dockerfiles is that they execute as a single shell process, allowing for complex scripting and state management within a single RUN instruction, which is often mistaken for a series of independent commands.
The next logical step is to manage secrets within these multi-line commands, which often involves combining heredocs with Docker secrets or build-time arguments.