Scale Django Horizontally on Kubernetes with HPA (2026)

Horizontal Pod Autoscaler (HPA) for Django on Kubernetes is less about "scaling out" your Django app and more about intelligently reacting to load by adjusting the number of Django pods your Kubernetes cluster runs.

Let’s see it in action. Imagine you have a Django app deployed on Kubernetes. When requests start pouring in, your app’s CPU usage climbs.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-django-app
spec:
  replicas: 1 # Start with one replica
  selector:
    matchLabels:
      app: my-django-app
  template:
    metadata:
      labels:
        app: my-django-app
    spec:
      containers:
      - name: django
        image: your-django-image:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            cpu: "100m" # Request 0.1 CPU
          limits:
            cpu: "200m" # Limit to 0.2 CPU

This deployment starts with a single Django pod. Now, let’s add the HPA to automatically adjust the number of my-django-app pods based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-django-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-django-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # Target 50% CPU utilization

When the average CPU utilization across all my-django-app pods exceeds 50%, the HPA will instruct the Deployment to scale up, creating more pods. Conversely, if the CPU utilization drops significantly below 50%, it will scale down, removing pods.

The problem HPA solves is the manual, reactive, and often slow process of adjusting the number of application instances. Instead of you logging in and manually changing replicas in your Deployment manifest, HPA continuously monitors metrics and makes those changes automatically. It decouples the decision to scale from the action of scaling.

Internally, the Kubernetes control plane has an hpa-controller that periodically fetches metrics for the scaleTargetRef (your Deployment). It calculates the desired number of replicas based on the minReplicas, maxReplicas, and the metrics you’ve defined. If the desired number of replicas differs from the current number, it updates the replicas field in the target Deployment. This is why your Django pods themselves don’t "know" about HPA; they are managed by the Deployment, which HPA orchestrates.

The exact levers you control are minReplicas, maxReplicas, and the target metric. For CPU, averageUtilization is common. You can also use averageValue for custom metrics or the raw value for specific metric instances. For memory, you’d use name: memory and a similar target configuration.

The most surprising thing is how HPA’s "utilization" target actually works. When you set averageUtilization: 50% for CPU, it’s not measuring the absolute CPU usage of your pods. Instead, it’s calculating current_cpu_usage / requested_cpu. So, if your Django pod requests 100m of CPU and is currently using 75m, its utilization is 75%. The HPA then uses this ratio to determine scaling. This means your resources.requests.cpu value in your Deployment is critical for HPA to function correctly and predictably. If you don’t set requests, or set them too high/low, your HPA will behave erratically.

Once your HPA is scaling correctly based on CPU, you’ll likely want to consider implementing readiness and liveness probes.