The HEALTHCHECK instruction in a Dockerfile isn’t reporting a healthy status, meaning Docker’s orchestrator (like Swarm or Kubernetes) is incorrectly assuming your container is unhealthy and potentially restarting it.

The core issue is that the command specified by HEALTHCHECK is failing to exit with a status code of 0 within the configured interval, signaling to Docker that the container’s primary process is not in a ready state.

Here are the common culprits and how to fix them:

1. The Healthcheck Command Itself is Flawed

  • Diagnosis: Manually run the HEALTHCHECK command inside the container. Get a shell into a running instance of your container (e.g., docker exec -it <container_id> sh) and execute the command exactly as it’s written in your Dockerfile.
  • Cause: The command might have syntax errors, be looking for a file that doesn’t exist, or have incorrect logic. For example, if your HEALTHCHECK is CMD curl http://localhost:8080/health, and the web server isn’t actually running or listening on port 8080 inside the container, curl will fail.
  • Fix: Correct the command. For the curl example, ensure your application is configured to listen on localhost:8080 and that the /health endpoint returns a 2xx or 3xx status code. If it’s a script, debug the script line by line.
  • Why it works: Docker’s HEALTHCHECK relies on the exit code of the specified command. A 0 exit code means success; anything else means failure. Fixing the command ensures it returns 0 only when the service is truly healthy.

2. Incorrect INTERVAL, TIMEOUT, RETRIES, or START PERIOD

  • Diagnosis: Examine your HEALTHCHECK instruction in the Dockerfile and compare it to your application’s startup time and responsiveness.
  • Cause:
    • INTERVAL: Too short for your application to initialize and become ready. If your app takes 30 seconds to start, but INTERVAL is 5 seconds, it will be checked before it’s ready.
    • TIMEOUT: Too short for the healthcheck command itself to execute. If your healthcheck command (e.g., a complex database query) takes 10 seconds, but TIMEOUT is 2 seconds, the check will fail even if the service is healthy.
    • RETRIES: Too low. The service might be intermittently failing during startup, but with enough retries, it would eventually pass.
    • START PERIOD: Not set or too short. This is a grace period where Docker doesn’t count failures against the RETRIES count, allowing your application to start up without immediately being marked unhealthy.
  • Fix: Adjust these parameters. For an application with a slow startup:
    HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 CMD curl -f http://localhost:8080/health || exit 1
    
    Here, we give it 60 seconds (START PERIOD) to get going, check every 30 seconds (INTERVAL), with a 5-second timeout for the check itself (TIMEOUT), and allow 3 failures (RETRIES) before declaring it unhealthy. The || exit 1 is crucial if the command itself doesn’t exit non-zero on failure (like curl without -f).
  • Why it works: These parameters control the timing and tolerance of the health check. Properly tuning them accounts for real-world application startup times and command execution durations, preventing premature failures.

3. Port Not Accessible or Listening

  • Diagnosis: Use docker ps to see your container’s ports. Then, try to curl the healthcheck endpoint from the Docker host or another container on the same network. If your HEALTHCHECK command uses localhost or 127.0.0.1, it’s checking from within the container’s network namespace.
  • Cause: The application inside the container is not listening on the expected network interface or port, or a firewall is blocking access. If your HEALTHCHECK is CMD curl http://localhost:8080/health, but your application is configured to listen on 0.0.0.0:8080 or 127.0.0.1:8080 and Docker’s network setup makes localhost inaccessible to the healthcheck process (less common but possible), it will fail.
  • Fix: Ensure your application is configured to listen on 0.0.0.0 (all interfaces) or the specific IP Docker assigns to the container if using localhost in the healthcheck. If checking from the host, ensure the container’s published port is correctly mapped. For an internal check, localhost or 127.0.0.1 is usually correct, but verify the application’s binding. Example: In your application’s configuration, ensure it’s binding to 0.0.0.0:8080 instead of a specific internal IP.
  • Why it works: The healthcheck command needs to be able to reach the service. If the service isn’t listening on an accessible address/port from where the HEALTHCHECK command is executed, the connection will fail.

4. Application Logic Error (Not Just Startup)

  • Diagnosis: Observe the container’s logs (docker logs <container_id>) and the Docker daemon logs for any repeated errors related to the healthcheck command or the application’s core functionality.
  • Cause: The application has started successfully but has entered an unhealthy state due to a bug, resource exhaustion (memory leak, disk full), or an external dependency failure (database down, API unreachable). The HEALTHCHECK command correctly reflects this.
  • Fix: Debug the application itself. Analyze logs for runtime errors. If the healthcheck endpoint relies on other services, ensure those services are healthy and accessible. For example, if /health checks database connectivity, and the database is down, the healthcheck will fail.
  • Why it works: The HEALTHCHECK is doing its job by reporting the actual state of the application. Fixing the underlying application logic or its dependencies resolves the reported unhealthiness.

5. Incorrect CMD or ENTRYPOINT in Dockerfile

  • Diagnosis: Review your CMD and ENTRYPOINT instructions in the Dockerfile.
  • Cause: If your HEALTHCHECK CMD is defined using the exec form (e.g., HEALTHCHECK CMD ["my-healthcheck-script.sh"]) but your ENTRYPOINT is also a shell form (e.g., ENTRYPOINT my-app), the HEALTHCHECK command might not be executed as expected, or it might be trying to run in an environment where its dependencies aren’t available. If ENTRYPOINT is a script that doesn’t exec the final command, it can interfere.
  • Fix: Use consistent forms for CMD and ENTRYPOINT and HEALTHCHECK. Prefer the exec form (JSON array) for all. Ensure your ENTRYPOINT script properly execs the final process, or that your HEALTHCHECK command runs in the same context as your main application. Example:
    ENTRYPOINT ["/app/run.sh"]
    CMD ["--foreground"] # This is the command passed to run.sh
    
    HEALTHCHECK --interval=5s CMD ["/usr/local/bin/my-healthcheck"] # Exec form
    
    If my-healthcheck needs to run commands that are only available in the shell spawned by ENTRYPOINT, you might need to call it from within run.sh or ensure my-healthcheck is a standalone binary.
  • Why it works: The exec form runs the command directly, without an intervening shell, which is generally more reliable and predictable for Docker’s health checks. It ensures the command runs with the correct environment and arguments.

6. Docker Daemon or Orchestrator Issues

  • Diagnosis: Check Docker daemon logs (journalctl -u docker.service on systemd systems) or the logs of your orchestrator (Kubernetes controller-manager, Swarm manager).
  • Cause: In rare cases, the Docker daemon itself or the orchestrator’s agent responsible for health checks might be experiencing issues, failing to process health check results correctly.
  • Fix: Restart the Docker daemon (sudo systemctl restart docker). If using Kubernetes, check kubelet logs on the node and the controller-manager logs. For Swarm, check manager logs.
  • Why it works: Resolves transient bugs or resource contention within the Docker or orchestrator components responsible for monitoring container health.

After fixing any of these, you should see your container transition to a "healthy" state. The next error you’ll likely encounter is related to resource limits or application-specific errors that the healthcheck doesn’t cover.

Want structured learning?

Take the full Docker course →