The HEALTHCHECK instruction in a Dockerfile isn’t reporting a healthy status, meaning Docker’s orchestrator (like Swarm or Kubernetes) is incorrectly assuming your container is unhealthy and potentially restarting it.
The core issue is that the command specified by HEALTHCHECK is failing to exit with a status code of 0 within the configured interval, signaling to Docker that the container’s primary process is not in a ready state.
Here are the common culprits and how to fix them:
1. The Healthcheck Command Itself is Flawed
- Diagnosis: Manually run the
HEALTHCHECKcommand inside the container. Get a shell into a running instance of your container (e.g.,docker exec -it <container_id> sh) and execute the command exactly as it’s written in your Dockerfile. - Cause: The command might have syntax errors, be looking for a file that doesn’t exist, or have incorrect logic. For example, if your
HEALTHCHECKisCMD curl http://localhost:8080/health, and the web server isn’t actually running or listening on port 8080 inside the container,curlwill fail. - Fix: Correct the command. For the
curlexample, ensure your application is configured to listen onlocalhost:8080and that the/healthendpoint returns a2xxor3xxstatus code. If it’s a script, debug the script line by line. - Why it works: Docker’s
HEALTHCHECKrelies on the exit code of the specified command. A0exit code means success; anything else means failure. Fixing the command ensures it returns0only when the service is truly healthy.
2. Incorrect INTERVAL, TIMEOUT, RETRIES, or START PERIOD
- Diagnosis: Examine your
HEALTHCHECKinstruction in the Dockerfile and compare it to your application’s startup time and responsiveness. - Cause:
INTERVAL: Too short for your application to initialize and become ready. If your app takes 30 seconds to start, butINTERVALis 5 seconds, it will be checked before it’s ready.TIMEOUT: Too short for the healthcheck command itself to execute. If your healthcheck command (e.g., a complex database query) takes 10 seconds, butTIMEOUTis 2 seconds, the check will fail even if the service is healthy.RETRIES: Too low. The service might be intermittently failing during startup, but with enough retries, it would eventually pass.START PERIOD: Not set or too short. This is a grace period where Docker doesn’t count failures against theRETRIEScount, allowing your application to start up without immediately being marked unhealthy.
- Fix: Adjust these parameters. For an application with a slow startup:
Here, we give it 60 seconds (HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 CMD curl -f http://localhost:8080/health || exit 1START PERIOD) to get going, check every 30 seconds (INTERVAL), with a 5-second timeout for the check itself (TIMEOUT), and allow 3 failures (RETRIES) before declaring it unhealthy. The|| exit 1is crucial if the command itself doesn’t exit non-zero on failure (likecurlwithout-f). - Why it works: These parameters control the timing and tolerance of the health check. Properly tuning them accounts for real-world application startup times and command execution durations, preventing premature failures.
3. Port Not Accessible or Listening
- Diagnosis: Use
docker psto see your container’s ports. Then, try tocurlthe healthcheck endpoint from the Docker host or another container on the same network. If yourHEALTHCHECKcommand useslocalhostor127.0.0.1, it’s checking from within the container’s network namespace. - Cause: The application inside the container is not listening on the expected network interface or port, or a firewall is blocking access. If your
HEALTHCHECKisCMD curl http://localhost:8080/health, but your application is configured to listen on0.0.0.0:8080or127.0.0.1:8080and Docker’s network setup makeslocalhostinaccessible to the healthcheck process (less common but possible), it will fail. - Fix: Ensure your application is configured to listen on
0.0.0.0(all interfaces) or the specific IP Docker assigns to the container if usinglocalhostin the healthcheck. If checking from the host, ensure the container’s published port is correctly mapped. For an internal check,localhostor127.0.0.1is usually correct, but verify the application’s binding. Example: In your application’s configuration, ensure it’s binding to0.0.0.0:8080instead of a specific internal IP. - Why it works: The healthcheck command needs to be able to reach the service. If the service isn’t listening on an accessible address/port from where the
HEALTHCHECKcommand is executed, the connection will fail.
4. Application Logic Error (Not Just Startup)
- Diagnosis: Observe the container’s logs (
docker logs <container_id>) and the Docker daemon logs for any repeated errors related to the healthcheck command or the application’s core functionality. - Cause: The application has started successfully but has entered an unhealthy state due to a bug, resource exhaustion (memory leak, disk full), or an external dependency failure (database down, API unreachable). The
HEALTHCHECKcommand correctly reflects this. - Fix: Debug the application itself. Analyze logs for runtime errors. If the healthcheck endpoint relies on other services, ensure those services are healthy and accessible. For example, if
/healthchecks database connectivity, and the database is down, the healthcheck will fail. - Why it works: The
HEALTHCHECKis doing its job by reporting the actual state of the application. Fixing the underlying application logic or its dependencies resolves the reported unhealthiness.
5. Incorrect CMD or ENTRYPOINT in Dockerfile
- Diagnosis: Review your
CMDandENTRYPOINTinstructions in the Dockerfile. - Cause: If your
HEALTHCHECK CMDis defined using theexecform (e.g.,HEALTHCHECK CMD ["my-healthcheck-script.sh"]) but yourENTRYPOINTis also a shell form (e.g.,ENTRYPOINT my-app), theHEALTHCHECKcommand might not be executed as expected, or it might be trying to run in an environment where its dependencies aren’t available. IfENTRYPOINTis a script that doesn’texecthe final command, it can interfere. - Fix: Use consistent forms for
CMDandENTRYPOINTandHEALTHCHECK. Prefer theexecform (JSON array) for all. Ensure yourENTRYPOINTscript properlyexecs the final process, or that yourHEALTHCHECKcommand runs in the same context as your main application. Example:
IfENTRYPOINT ["/app/run.sh"] CMD ["--foreground"] # This is the command passed to run.sh HEALTHCHECK --interval=5s CMD ["/usr/local/bin/my-healthcheck"] # Exec formmy-healthcheckneeds to run commands that are only available in the shell spawned byENTRYPOINT, you might need to call it from withinrun.shor ensuremy-healthcheckis a standalone binary. - Why it works: The
execform runs the command directly, without an intervening shell, which is generally more reliable and predictable for Docker’s health checks. It ensures the command runs with the correct environment and arguments.
6. Docker Daemon or Orchestrator Issues
- Diagnosis: Check Docker daemon logs (
journalctl -u docker.serviceon systemd systems) or the logs of your orchestrator (Kubernetes controller-manager, Swarm manager). - Cause: In rare cases, the Docker daemon itself or the orchestrator’s agent responsible for health checks might be experiencing issues, failing to process health check results correctly.
- Fix: Restart the Docker daemon (
sudo systemctl restart docker). If using Kubernetes, checkkubeletlogs on the node and the controller-manager logs. For Swarm, check manager logs. - Why it works: Resolves transient bugs or resource contention within the Docker or orchestrator components responsible for monitoring container health.
After fixing any of these, you should see your container transition to a "healthy" state. The next error you’ll likely encounter is related to resource limits or application-specific errors that the healthcheck doesn’t cover.