Configure AKS Cluster Autoscaler (2026)

The AKS Cluster Autoscaler doesn’t actually scale your nodes in the traditional sense; it tells Azure to provision or deprovision nodes based on pending pods.

Let’s see it in action. Imagine you have a deployment with 3 replicas, and you’ve set a resource request for each pod: cpu: 500m, memory: 1Gi. Your cluster currently has two nodes, each with cpu: 2, memory: 4Gi.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: nginx
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"

Initially, the scheduler can place all three pods. Pod 1 uses 0.5 CPU / 1 Gi RAM. Pod 2 uses 0.5 CPU / 1 Gi RAM. Pod 3 uses 0.5 CPU / 1 Gi RAM. Node 1 has 2 CPU / 4 Gi RAM. Pod 1 and Pod 2 land here. Remaining: 1 CPU / 2 Gi RAM. Node 2 has 2 CPU / 4 Gi RAM. Pod 3 lands here. Remaining: 1.5 CPU / 3 Gi RAM.

Now, you scale up your deployment to 5 replicas.

kubectl scale deployment my-app --replicas=5

The scheduler tries to place the new Pod 4 and Pod 5. Pod 4 needs 0.5 CPU / 1 Gi RAM. It can’t fit on Node 1 (only 1 CPU / 2 Gi RAM left) and it can’t fit on Node 2 (only 1.5 CPU / 3 Gi RAM left). Pod 5 also needs 0.5 CPU / 1 Gi RAM. It also can’t fit on either node.

At this point, Pod 4 and Pod 5 are in a Pending state because the scheduler can’t find a node with sufficient resources. This is where the Cluster Autoscaler wakes up. It sees these pending pods and calculates if adding a new node would allow them to be scheduled.

The Cluster Autoscaler is configured with a min-nodes and max-nodes value for each node pool. Let’s say your node pool is configured with min-nodes=1 and max-nodes=5. The autoscaler checks if adding a node would satisfy the pending pods. If a new node is provisioned (e.g., a standard Standard_DS2_v2 with 2 vCPU and 8 GiB RAM), Pod 4 and Pod 5 can be scheduled onto it. Once they are scheduled and running, the autoscaler considers its job done for now.

Conversely, if a node becomes underutilized for a configurable period (the default is 10 minutes), the autoscaler will check if all pods on that node can be rescheduled onto other nodes. If they can, and the node count would remain above min-nodes, the autoscaler will signal Azure to deprovision that node.

The core problem the Cluster Autoscaler solves is ensuring your applications have the compute resources they need without you manually managing node counts. It bridges the gap between your application’s dynamic resource demands and the static nature of virtual machines.

Internally, it works by watching the Kubernetes API for pods that are Pending due to insufficient CPU or memory. It also watches for nodes that are underutilized. For pending pods, it simulates adding a new node (based on your node pool’s VM size and scale set configuration) and checks if the pending pods would fit. If they would, it triggers Azure to create a new node. For underutilized nodes, it checks if removing a node would cause any existing pods to become unschedulable. If not, it triggers Azure to delete the node.

The key levers you control are the min-nodes and max-nodes parameters for each node pool. These define the boundaries within which the autoscaler operates. You also influence it indirectly through your pod resource requests and limits, as these dictate what the autoscaler considers "sufficient resources" when making its decisions.

The autoscaler doesn’t directly consider pod requests for scaling up when nodes are full; it focuses on pending pods. However, for scaling down, it does look at actual pod resource utilization on a node to determine if it’s truly underutilized and if its pods can be safely moved. This means a node might appear mostly empty to the scheduler if pods have high limits but low actual usage, yet the autoscaler might not scale it down if those pods are still considered "in use" by its criteria.

The next concept to explore is how to configure node pool profiles, including VM sizes and taints/tolerations, to work effectively with the Cluster Autoscaler.