Add Health and Readiness Endpoints to FastAPI for Kubernetes (2026)

FastAPI applications, when deployed to Kubernetes, often need to expose health and readiness endpoints so the orchestrator can properly manage their lifecycle.

Imagine a FastAPI app running inside a Kubernetes pod. Kubernetes needs to know if your app is alive and ready to serve traffic. It does this by periodically hitting specific URLs (endpoints) on your application. If your app doesn’t respond or responds with an error, Kubernetes might restart the pod (liveness probe) or stop sending traffic to it (readiness probe).

Here’s a simple FastAPI app:

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def read_root():
    return {"Hello": "World"}

@app.get("/status")
async def get_status():
    # In a real app, this would check database connections, external services, etc.
    return {"status": "ok"}

Now, let’s add proper health and readiness checks. These are typically just HTTP endpoints that return a success or failure status.

from fastapi import FastAPI, HTTPException
import logging

app = FastAPI()

# Simulate a dependency that might fail
DATABASE_CONNECTED = True
EXTERNAL_SERVICE_AVAILABLE = True

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@app.get("/")
async def read_root():
    return {"Hello": "World"}

@app.get("/health")
async def health_check():
    """
    Liveness probe endpoint.
    Checks if the application process is running and responsive.
    """
    logger.info("Health check requested.")
    # A basic liveness check might just ensure the process is alive.
    # For a more robust check, you could ping a local resource or
    # check internal state that indicates the application is running
    # but not necessarily ready for traffic.
    return {"status": "up"}

@app.get("/ready")
async def readiness_check():
    """
    Readiness probe endpoint.
    Checks if the application is ready to serve traffic.
    This means all its dependencies are met (e.g., database, cache, external services).
    """
    logger.info("Readiness check requested.")
    if not DATABASE_CONNECTED:
        logger.error("Database connection failed.")
        raise HTTPException(status_code=503, detail="Database not available")
    if not EXTERNAL_SERVICE_AVAILABLE:
        logger.error("External service unavailable.")
        raise HTTPException(status_code=503, detail="External service not available")

    # In a real application, you'd perform actual checks here:
    # try:
    #     db_connection.execute("SELECT 1")
    # except Exception as e:
    #     logger.error(f"Database check failed: {e}")
    #     raise HTTPException(status_code=503, detail="Database connection error")
    #
    # try:
    #     response = requests.get("http://external-service.com/health")
    #     response.raise_for_status()
    # except Exception as e:
    #     logger.error(f"External service check failed: {e}")
    #     raise HTTPException(status_code=503, detail="External service error")

    return {"status": "ready"}

# Simulate dependency failures for testing
@app.post("/simulate_db_failure")
async def simulate_db_failure():
    global DATABASE_CONNECTED
    DATABASE_CONNECTED = False
    logger.warning("Simulating database failure.")
    return {"message": "Database connection simulated as failed"}

@app.post("/simulate_db_recovery")
async def simulate_db_recovery():
    global DATABASE_CONNECTED
    DATABASE_CONNECTED = True
    logger.warning("Simulating database recovery.")
    return {"message": "Database connection simulated as recovered"}

@app.post("/simulate_external_service_failure")
async def simulate_external_service_failure():
    global EXTERNAL_SERVICE_AVAILABLE
    EXTERNAL_SERVICE_AVAILABLE = False
    logger.warning("Simulating external service failure.")
    return {"message": "External service simulated as unavailable"}

@app.post("/simulate_external_service_recovery")
async def simulate_external_service_recovery():
    global EXTERNAL_SERVICE_AVAILABLE
    EXTERNAL_SERVICE_AVAILABLE = True
    logger.warning("Simulating external service recovery.")
    return {"message": "External service simulated as available"}

In your Kubernetes deployment YAML, you would configure these endpoints as probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-fastapi-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi
  template:
    metadata:
      labels:
        app: fastapi
    spec:
      containers:
      - name: app
        image: your-docker-image # Replace with your actual image
        ports:
        - containerPort: 8000
        livenessProbe:
          httpGet:
            path: /health # This hits our /health endpoint
            port: 8000
          initialDelaySeconds: 15 # Wait 15 seconds before first probe
          periodSeconds: 20      # Probe every 20 seconds
          timeoutSeconds: 5      # Fail if no response within 5 seconds
          failureThreshold: 3    # Consider the container unhealthy after 3 failures
        readinessProbe:
          httpGet:
            path: /ready # This hits our /ready endpoint
            port: 8000
          initialDelaySeconds: 5  # Wait 5 seconds before first probe
          periodSeconds: 10       # Probe every 10 seconds
          timeoutSeconds: 5       # Fail if no response within 5 seconds
          failureThreshold: 3     # Consider the container not ready after 3 failures

The core idea is that /health (liveness) is a quick check to see if the process is running and the application is capable of responding. It might just check if the web server thread is active. /ready (readiness), on the other hand, is a more thorough check. It verifies that the application has successfully initialized all its critical dependencies—like connecting to a database, loading configuration, or establishing connections to essential services. If /ready fails, Kubernetes will stop sending new traffic to that pod until the endpoint starts returning a success code (e.g., 200 OK).

The initialDelaySeconds is crucial. It gives your application time to start up and initialize before Kubernetes begins probing. Without it, Kubernetes might kill your pod before it even has a chance to become ready. periodSeconds defines how often the probe runs, timeoutSeconds is how long Kubernetes waits for a response, and failureThreshold is how many consecutive failures trigger the probe to fail.

A subtle but important point is how FastAPI’s HTTPException with a 503 Service Unavailable status code is interpreted by Kubernetes. When a probe endpoint returns a non-2xx status code (like 503), Kubernetes registers it as a probe failure. This is exactly what you want for a readiness probe when a dependency is down. The application itself is running, but it’s not ready to serve traffic.

The next thing you’ll likely encounter is managing the lifecycle of these dependencies more dynamically, perhaps using background tasks or startup events within FastAPI to signal readiness changes.