Spot node pools in Azure Kubernetes Service (AKS) can slash your compute costs by leveraging Azure’s spare capacity.

Let’s see it in action. Imagine you have a stateless web application that can tolerate occasional interruptions. You’ve deployed it to a standard AKS node pool.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

This deployment runs on regular, on-demand VMs, which are the most expensive option. Now, let’s create a spot node pool for the same application.

First, we define the spot node pool. This tells AKS to provision VMs from Azure’s spot market.

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name spotpool \
    --node-count 2 \
    --node-vm-size Standard_DS2_v2 \
    --spot-max-price -1 \
    --labels purpose=batch \
    --no-wait

Key here is --spot-max-price -1. This means you’re willing to pay the current spot price, whatever it may be, which is typically a massive discount compared to on-demand pricing. --node-vm-size specifies the VM size, and --labels allows us to target specific workloads.

Next, we modify our deployment to ensure it only runs on the spot node pool. We use node selectors.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: purpose
                operator: In
                values:
                - batch

By adding the nodeSelector with purpose: batch, we instruct Kubernetes to only schedule pods for this deployment onto nodes that have the purpose=batch label. Since we applied this label to our spot node pool, the pods will land there.

The core problem spot instances solve is the high cost of persistent, always-on compute. For workloads that can handle preemption (being evicted with a 30-second warning when Azure needs the capacity back), spot VMs offer savings of up to 90%. The trick is to isolate these interruptible workloads onto dedicated spot node pools and use node selectors or taints/tolerations to ensure only suitable applications run there.

When Azure needs the capacity back from a spot VM, the node is marked as unschedulable, and AKS drains the pods from it. Your application’s pods receive a SIGTERM signal, giving them a short window to shut down gracefully. If you’re running stateless applications, a load balancer or replica set will simply spin up new pods on available nodes. For stateful applications, you’d typically use persistent storage and potentially more robust strategies like pod disruption budgets and anti-affinity rules to ensure availability.

The surprise is how little actual configuration is needed to achieve massive savings. You don’t need a separate Kubernetes cluster; you just add another node pool to your existing AKS cluster. The az aks nodepool add command is your primary tool. The -1 for --spot-max-price is crucial; it doesn’t mean "free," it means "bid the maximum possible," which is always less than the on-demand price and dynamically adjusts.

What most people miss is that node pools are the fundamental unit of compute isolation within an AKS cluster. You can have multiple node pools with different VM sizes, operating systems, and crucially, pricing models (on-demand vs. spot) all within the same cluster. You then use Kubernetes scheduling primitives (node selectors, taints, tolerations, affinity/anti-affinity) to direct workloads to the appropriate pools.

The next step is to understand how to manage spot interruptions gracefully using Pod Disruption Budgets and graceful termination.

Want structured learning?

Take the full Aks course →