Burst AKS Workloads to Azure Container Instances (2026)

Azure Kubernetes Service (AKS) workloads can burst to Azure Container Instances (ACI) to handle sudden spikes in demand.

Let’s see it in action. Imagine an e-commerce site running on AKS that experiences a flash sale. Traffic surges, and the AKS pods start hitting their resource limits. Instead of dropping requests or failing, the application can seamlessly scale out to ACI.

Here’s a simplified view of a deployment manifest that allows for bursting:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ecomm-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ecomm-app
  template:
    metadata:
      labels:
        app: ecomm-app
    spec:
      containers:
      - name: app-container
        image: your-dockerhub-username/ecomm-app:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"
      # This is where the magic for ACI bursting happens
      # In a real scenario, this would be managed by a custom scheduler or a dedicated operator
      # For demonstration, we'll assume a mechanism is in place to detect high load
      # and trigger ACI deployment.
      # A common pattern is to use the KEDA (Kubernetes Event-driven Autoscaling) operator
      # with an 'azure-container-instances' scaler.
      # KEDA would monitor a metric (e.g., queue length, HTTP request rate)
      # and when it exceeds a threshold, it would provision ACI instances.
      # The actual configuration for KEDA would look something like this (not part of the Deployment itself):
      # apiVersion: keda.sh/v1alpha1
      # kind: ScaledObject
      # metadata:
      #   name: ecomm-app-aci-scaler
      # spec:
      #   scaleTargetRef:
      #     name: ecomm-app # The name of the Deployment
      #   pollingInterval: 30  # seconds
      #   cooldownPeriod:  300 # seconds
      #   minReplicaCount: 3   # AKS replicas
      #   maxReplicaCount: 20  # Total replicas including ACI
      #   triggers:
      #   - type: azurecontainerinstances
      #     metadata:
      #       # Example for scaling based on HTTP requests
      #       # This would require an Azure Monitor metric or similar
      #       # For simplicity, let's consider a custom metric or a queue length
      #       resourceName: "my-aks-cluster"
      #       subscriptionId: "YOUR_SUBSCRIPTION_ID"
      #       resourceGroup: "my-aks-resource-group"
      #       # KEDA often uses external metrics adapters or specific Azure services for triggers
      #       # A common pattern is to scale based on Azure Service Bus queue length or Azure Monitor metrics.
      #       # Let's assume a hypothetical trigger based on a custom metric:
      #       metricName: "http_requests_per_second"
      #       threshold: "1000"

The core problem this solves is elasticity. AKS provides a robust, managed Kubernetes environment, but scaling it up and down precisely to match fluctuating demand can be complex and sometimes slow. ACI, on the other hand, offers near-instantaneous deployment of individual containers. By bursting to ACI, you get the best of both worlds: the managed orchestration of AKS for your baseline load and the rapid, on-demand scaling of ACI for unpredictable peaks.

Internally, the mechanism relies on an autoscaling component, often KEDA, which acts as a bridge. KEDA monitors specific metrics (e.g., queue length in Azure Service Bus, custom application metrics exposed via Prometheus, or Azure Monitor metrics). When these metrics exceed predefined thresholds, KEDA triggers the provisioning of ACI instances. These ACI instances run the same container image as your AKS pods, ensuring consistency. Traffic is then directed to these new ACI instances, either through a load balancer that can route to both AKS and ACI endpoints, or by having the application itself handle dynamic endpoint discovery.

The specific levers you control are primarily around the autoscaler’s configuration: the metrics it watches, the thresholds that trigger scaling, the minimum and maximum number of replicas (both in AKS and the burst capacity in ACI), and the cooldown periods to prevent rapid flapping. You also need to ensure your application is designed to handle being scaled out to potentially many independent ACI instances and that your networking allows traffic to reach them. This often involves using a central ingress controller or API gateway that can dynamically update its routing rules to include the ACI endpoints.

What many people miss is how the traffic routing is managed during a burst. It’s not automatic; you typically need a sophisticated ingress solution. This might be Azure Application Gateway with its WAF and backend pool management, or a service mesh like Istio configured to dynamically add ACI IPs to its routing. The ingress controller needs to be aware of the ACI instances being spun up and seamlessly incorporate them into the overall traffic flow without interrupting existing connections.

The next concept to explore is how to manage stateful workloads that need to burst.