A container exiting with a non-zero status code is Kubernetes’ way of saying "something went wrong in there." It’s not just a generic failure; that specific number is a direct clue from your application, telling you what kind of error it encountered before it gave up.
Let’s see this in action. Imagine a simple Python script that tries to open a file that doesn’t exist.
import sys
import os
try:
with open("non_existent_file.txt", "r") as f:
content = f.read()
print(content)
except FileNotFoundError:
print("Error: File not found!")
sys.exit(1) # Explicitly exit with code 1 for FileNotFoundError
except Exception as e:
print(f"An unexpected error occurred: {e}")
sys.exit(2) # Exit with code 2 for other exceptions
sys.exit(0) # Success
If we build this into a Docker image and run it in Kubernetes, we’ll see this exit code.
# Dockerfile
FROM python:3.9-slim
COPY your_script.py /app/
WORKDIR /app
CMD ["python", "your_script.py"]
# Build and run locally (simulating Kubernetes behavior)
docker build -t my-app-exit-codes .
docker run --rm my-app-exit-codes
Output:
Error: File not found!
And if we check the container’s exit code (e.g., using docker ps -a on the local machine or kubectl logs <pod-name> --previous and then inspecting the pod status in Kubernetes), we’d see it exited with 1. This 1 is the signal that the FileNotFoundError was the culprit.
The beauty of container exit codes is that they’re a convention, inherited from the Unix philosophy where a non-zero exit status signifies an error. Your application code is responsible for translating its internal error states into these specific numerical codes. This allows Kubernetes, and you, to quickly understand the nature of the failure without needing to parse complex logs immediately.
Common Container Exit Codes and Their Meanings
While your application can exit with any non-zero integer, some codes are conventionally used for specific error types. Understanding these common ones is key to rapid debugging.
-
Exit Code 1: General Application Error
- Diagnosis: This is the most common catch-all. Your application encountered an error it recognized but didn’t have a more specific code for.
- Fix: Examine your application logs for the precise error message. This often points to invalid input, configuration issues, or unhandled exceptions. For example, if your application expected a JSON configuration file and it was malformed, it might exit with 1.
- Why it works: It signals a problem within the application’s logic or data handling, requiring a deeper dive into the application’s output.
-
Exit Code 126: Command Invoked Cannot Execute
- Diagnosis: The entrypoint script or command specified in your container’s
CMDorENTRYPOINTis not executable. - Check: Run
kubectl exec <pod-name> -- ls -l /path/to/your/entrypointand verify the execute permissions (x). Also, check if the file exists at all. - Fix: Ensure your
DockerfilehasRUN chmod +x /path/to/your/entrypointfor your script. - Why it works: The operating system within the container refused to run the specified program because it lacked the necessary execute permissions.
- Diagnosis: The entrypoint script or command specified in your container’s
-
Exit Code 127: Command Not Found
- Diagnosis: The command specified in your
CMDorENTRYPOINTdoes not exist in the container’sPATH. - Check: Inside a running container (if possible, or by exec-ing into a previous failed one if the image is still there), run
which <command>ortype <command>. - Fix: Ensure the executable is installed in the container image and that its directory is in the
PATHenvironment variable. For example, if you tried to runmy-custom-tooland it wasn’t installed or itsbindirectory wasn’t in thePATH, you’d get this. You might need to addENV PATH="/opt/my-tool/bin:${PATH}"in yourDockerfile. - Why it works: The shell couldn’t find the executable file to launch, similar to typing a non-existent command on your local machine.
- Diagnosis: The command specified in your
-
Exit Code 137: SIGKILL (9) + 128
- Diagnosis: The container was forcefully terminated by the system. This is often due to the container exceeding its memory limit. Kubernetes sends
SIGKILL(signal 9) to the container’s main process, and the exit code is128 + signal_number. - Check: Examine the pod’s events (
kubectl describe pod <pod-name>) forOOMKilled(Out Of Memory Killed) messages. Checkkubectl top pod <pod-name>or your cluster’s monitoring for memory usage. - Fix: Increase the
resources.limits.memoryin your pod’s YAML definition. You might also need to optimize your application’s memory usage. - Why it works: The container runtime (like containerd or Docker) received an instruction from the Kubernetes node to stop the container immediately and ungracefully because it was consuming too much memory.
- Diagnosis: The container was forcefully terminated by the system. This is often due to the container exceeding its memory limit. Kubernetes sends
-
Exit Code 139: SIGSEGV (11) + 128
- Diagnosis: The container’s main process experienced a Segmentation Fault. This usually indicates a bug in the application itself, often related to memory corruption, null pointer dereferences, or accessing invalid memory locations.
- Fix: This requires deep debugging of your application. Use a debugger (like GDB if compiled into your binary) or add extensive logging to pinpoint the exact line of code causing the fault.
- Why it works: The CPU detected an attempt by the program to access memory it wasn’t allowed to, causing the operating system to terminate the process.
-
Exit Code 143: SIGTERM (15) + 128
- Diagnosis: The container received a termination signal (
SIGTERM) and exited gracefully. This is the preferred way for Kubernetes to stop a pod. It means your application had time to shut down cleanly. - Fix: If this is happening unexpectedly and you didn’t intend for it to terminate, it might be that your application isn’t correctly handling
SIGTERM. If it’s happening during scaling events or node maintenance, it’s usually expected. Checkkubectl describe pod <pod-name>forTerminatingstatus and related events. - Why it works: The container process received a polite request to shut down, which it honored. If it’s not graceful, it means the application didn’t implement a
SIGTERMhandler.
- Diagnosis: The container received a termination signal (
-
Custom Application Exit Codes:
- Diagnosis: Your application explicitly uses specific codes for business logic errors. For example, an API service might return
10if a required external service is unavailable, or20if a specific validation rule fails. - Fix: This is entirely application-dependent. You need to consult your application’s documentation or source code to understand what these custom codes signify.
- Why it works: These are designed by the application developer to provide granular control over error reporting.
- Diagnosis: Your application explicitly uses specific codes for business logic errors. For example, an API service might return
The Next Error You’ll See
Once you’ve fixed the root cause, the next "error" you’ll likely encounter is the absence of one – your container will exit with code 0, signifying successful execution, and Kubernetes will consider the pod healthy.