This isn’t just about a container crashing; it’s about the Docker daemon itself failing to keep a process alive in a way that signals a critical failure.
When a Docker container exits with a non-zero status code, it means the primary process running inside that container terminated with an error. Docker, by default, just records this exit code and stops the container. The daemon itself is functioning, but the application within the container isn’t.
Here are the most common reasons why a container might exit with a non-zero status code, and how to diagnose and fix them:
1. Application Crashes Due to Unhandled Exceptions
This is the most frequent culprit. The application code inside your container encountered an error it didn’t anticipate or know how to recover from, leading to a crash.
-
Diagnosis: The first step is always to inspect the container’s logs.
docker logs <container_id_or_name>Look for stack traces, error messages, or any output that indicates an unhandled exception in your application’s language (e.g.,
Segmentation fault,panic:,UncaughtError,NullPointerException). -
Fix:
- Development: Fix the bug in your application code. Add error handling, validate inputs, or correct the logic that caused the crash.
- Production: If this is an unexpected bug, you might need to redeploy with a fixed image. If it’s a known issue, ensure your application is robust enough to handle edge cases. For example, if your Python app crashes on an
IndexError:# Original (buggy) my_list = [1, 2] print(my_list[5]) # Fixed my_list = [1, 2] try: print(my_list[5]) except IndexError: print("Index out of bounds!")
This works because the
try-exceptblock gracefully catches theIndexError, preventing the program from terminating abruptly.
2. Missing or Incorrectly Configured Dependencies
Your application might rely on libraries, configuration files, or environment variables that are not present, are corrupted, or are set up incorrectly within the container’s filesystem or environment.
-
Diagnosis:
- Check container logs for errors like
FileNotFoundError,ModuleNotFoundError,ImportError, or messages indicating a configuration file couldn’t be read or parsed. - Inspect the container’s filesystem (if it’s still running briefly or if you can exec into a temporary instance of the same image).
# If the container exited, you can't exec. Try running a temporary one from the same image: docker run -it --rm <image_name> bash # Then navigate to where your app expects dependencies and check their presence/permissions.
- Check container logs for errors like
-
Fix:
- Dockerfile: Ensure all necessary dependencies are installed correctly in your
Dockerfile(e.g.,RUN pip install -r requirements.txt,RUN apt-get update && apt-get install -y ...). - Volumes/Mounts: Verify that any volumes or bind mounts containing configuration files or data are correctly specified in your
docker runcommand ordocker-compose.ymland that the paths inside the container match your application’s expectations. - Environment Variables: Ensure all required environment variables are passed into the container (e.g.,
docker run -e MY_VAR=value ...). If a variable is missing, the application might fail to initialize. For example, if your application needsDATABASE_URLand it’s not set:# Incorrect (missing env var) docker run my_app_image # Correct (providing the env var) docker run -e DATABASE_URL="postgresql://user:pass@host:port/dbname" my_app_image
This works because the application can now access the configuration it needs to connect to its database or other services.
- Dockerfile: Ensure all necessary dependencies are installed correctly in your
3. Insufficient Resources (Memory/CPU)
The application within the container might be requesting more memory or CPU than is available to the Docker daemon or the host machine, leading to the operating system or Docker itself terminating the process to prevent system instability.
-
Diagnosis:
- Check host system logs (e.g.,
/var/log/syslogordmesgon Linux) for Out-Of-Memory (OOM) killer events (oom-killer). - Monitor container resource usage using
docker stats <container_id_or_name>while it’s running (if you can start it long enough to observe). If it crashes immediately, this is harder to diagnose directly fromdocker stats. - Container logs might show cryptic errors if the application itself tries to allocate too much memory and fails.
- Check host system logs (e.g.,
-
Fix:
- Increase Host Resources: If your host machine is under-resourced, add more RAM or CPU.
- Limit Container Resources: In your
docker runcommand ordocker-compose.yml, set resource limits for the container to prevent it from consuming too much.# Example with Docker run docker run --memory="512m" --cpus="0.5" my_app_image # Example with docker-compose.yml services: my_app: image: my_app_image deploy: resources: limits: cpus: '0.5' memory: 512M
This works by telling the Docker daemon to enforce these limits, preventing the container’s processes from exceeding them and triggering the OS OOM killer.
4. Incorrect Entrypoint or CMD
The ENTRYPOINT or CMD instruction in your Dockerfile might be misconfigured, pointing to a non-existent executable, or executing a command that exits immediately with an error.
-
Diagnosis:
- Examine your
Dockerfilefor theENTRYPOINTandCMDinstructions. - Try running the container with an interactive shell to manually execute the intended command.
# Replace CMD/ENTRYPOINT with bash or sh docker run -it --entrypoint bash <image_name> # Inside the container, try running the command specified in your original CMD/ENTRYPOINT # For example, if your CMD was ["python", "app.py"]: python app.py - Check container logs for errors like "executable file not found" or the output of the command you manually ran.
- Examine your
-
Fix:
- Correct the
ENTRYPOINTorCMDin yourDockerfile. Ensure the executable path is correct and that the command is valid. - If you intended to run a script, make sure it has execute permissions (
chmod +x). - Example
Dockerfilefix:# Original (typo or wrong path) # CMD ["/usr/local/bin/my_app"] # Corrected CMD ["/usr/local/bin/my_app"] # Assuming this is the correct path
This works by ensuring the correct, executable application is launched when the container starts.
- Correct the
5. Permissions Issues
The user running the process inside the container might not have the necessary read, write, or execute permissions for files or directories it needs to access.
-
Diagnosis:
- Check container logs for
Permission deniederrors. - If possible, run an interactive shell into the container (as described in point 4) and check the ownership and permissions of relevant files and directories using
ls -l.# Inside container ls -l /path/to/file_or_directory - Identify the user the process is running as (often
rootby default, but many images use a non-root user likenode,www-data, or a custom user ID).
- Check container logs for
-
Fix:
- Dockerfile: Use
RUN chownandRUN chmodcommands in yourDockerfileto set the correct permissions for files and directories. - User Context: If your application needs to write to specific locations, ensure the user specified in the
USERinstruction in yourDockerfile(or the default user) has write permissions. - Volumes: If using volumes, ensure the user ID inside the container matches the owner of the files on the host system, or grant appropriate permissions.
# Example Dockerfile fix USER root # Temporarily switch to root to change permissions RUN chown -R appuser:appgroup /app/data RUN chmod -R 755 /app/data USER appuser # Switch back to the application user CMD ["python", "app.py"]
This works because the application process, now running as
appuser, has the necessary permissions to read from or write to/app/data. - Dockerfile: Use
6. Health Check Failures Triggering Restart Policies
While not a direct exit code cause, if you have a restart policy like on-failure and a health check configured, a failing health check can cause Docker to restart the container, and if the application consistently fails its health check, it might exit with a non-zero status code.
-
Diagnosis:
- Check
docker inspect <container_id_or_name>for theRestartPolicy. - Check
docker inspect <container_id_or_name>forHealthstatus. - Examine the health check command itself in your
Dockerfileor compose file. - Check container logs for output related to the health check failing.
- Check
-
Fix:
- Address the underlying issue causing the health check to fail (which will likely be one of the other points above).
- Adjust the health check configuration (e.g.,
interval,timeout,retries) if it’s too aggressive for your application’s startup time.# Example Dockerfile healthcheck HEALTHCHECK --interval=5m --timeout=3s --start-period=10s --retries=3 CMD curl -f http://localhost:80/health || exit 1
This works by ensuring the health check accurately reflects the application’s readiness and isn’t overly sensitive to transient issues.
The next error you’ll likely encounter after fixing these is a container that starts successfully but then fails to perform its intended function due to application logic errors or external service unavailability.