Deploy FastAPI with Zero Downtime Using Rolling Updates (2026)

FastAPI applications can be updated with zero downtime by leveraging a technique called rolling updates, which involves gradually replacing old instances of your application with new ones.

Let’s see this in action. Imagine we have a simple FastAPI app running in a Kubernetes cluster.

# main.py
from fastapi import FastAPI
import time

app = FastAPI()

@app.get("/")
async def read_root():
    # Simulate some work
    time.sleep(2)
    return {"message": "Hello from version 1!"}

We’ve deployed this using a Kubernetes Deployment object.

# deployment-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi-app
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      containers:
      - name: fastapi-app
        image: your-docker-repo/fastapi-app:v1
        ports:
        - containerPort: 80

And a Service to expose it.

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: fastapi-app-service
spec:
  selector:
    app: fastapi-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP

Now, we want to update to version 2. Our new app code looks like this:

# main.py (version 2)
from fastapi import FastAPI
import time

app = FastAPI()

@app.get("/")
async def read_root():
    # Simulate some work
    time.sleep(1) # Faster response
    return {"message": "Hello from version 2!"}

We build a new Docker image, your-docker-repo/fastapi-app:v2, and update our Deployment manifest.

# deployment-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi-app
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      containers:
      - name: fastapi-app
        image: your-docker-repo/fastapi-app:v2 # Updated image
        ports:
        - containerPort: 80

When we apply this new deployment (kubectl apply -f deployment-v2.yaml), Kubernetes doesn’t immediately kill all old pods and start new ones. Instead, it performs a rolling update. Here’s how it works internally:

Create New Pods: Kubernetes starts creating new pods based on the updated template. It will create maxUnavailable fewer pods than maxSurge pods. For our replicas: 3 and default maxUnavailable: 1, maxSurge: 25% (which rounds up to 1), Kubernetes will create one new pod.
Wait for Readiness: Kubernetes waits for the new pod to become "ready." Readiness is determined by the readinessProbe defined in your container spec. If you don’t define one, it defaults to checking if the container’s main process is running. For a web application, you should always define a readiness probe. For FastAPI, this might be an HTTP GET request to a health check endpoint.
Terminate Old Pods: Once a new pod is ready, Kubernetes terminates one of the old pods.
Repeat: This process repeats: create a new pod, wait for it to be ready, terminate an old pod. It continues until all old pods are replaced by new ones.

The key to zero downtime here is the readiness probe and the maxUnavailable setting in the Deployment’s strategy.

A good readiness probe for FastAPI would look like this:

# deployment-v2.yaml (with readinessProbe)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1 # Can tolerate one pod being down
      maxSurge: 1       # Can have one extra pod running
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      containers:
      - name: fastapi-app
        image: your-docker-repo/fastapi-app:v2
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            path: /health # You'd add a /health endpoint to your app
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

And in your FastAPI app, you’d add a health check endpoint:

# main.py (version 2, with health check)
from fastapi import FastAPI
import time

app = FastAPI()

@app.get("/")
async def read_root():
    time.sleep(1)
    return {"message": "Hello from version 2!"}

@app.get("/health")
async def health_check():
    return {"status": "ok"}

The maxUnavailable: 1 setting ensures that at least two pods are always available to serve traffic during the update (if you have 3 replicas). The readinessProbe ensures that Kubernetes only terminates an old pod after a new one is fully ready and capable of receiving traffic. Services (like our fastapi-app-service) automatically update their endpoints to point to only the ready pods, so traffic is never sent to a pod that’s starting up or shutting down.

What most people don’t realize is that the default behavior of RollingUpdate is quite robust, but it’s entirely dependent on a correctly configured readinessProbe. Without it, Kubernetes might terminate old pods before new ones are ready, leading to brief periods where requests fail because there are no healthy pods to handle them. The initialDelaySeconds on the probe is crucial for giving new pods time to start their application framework and any initializations before being considered ready.

Once the rolling update is complete, all pods will be running the new version of your application. The next problem you’ll likely encounter is managing configuration changes alongside code updates.