AKS is designed to run your Kubernetes workloads, but it also needs its own internal Kubernetes components to manage the cluster. Separating these into distinct node pools is the key to a more stable and predictable environment.
Let’s see this in action. Imagine we have a basic AKS cluster. By default, all pods, including your application pods and AKS system pods, run on the same set of nodes.
Here’s a typical kubectl get nodes output from a cluster with mixed node pools:
NAME STATUS ROLES AGE VERSION
aks-agentpool-12345678-vmss000000 Ready agent 90d v1.27.7
aks-agentpool-12345678-vmss000001 Ready agent 90d v1.27.7
aks-agentpool-12345678-vmss000002 Ready agent 90d v1.27.7
In this scenario, system pods like coredns, metrics-server, and kube-proxy are scheduled alongside your application pods. If your application experiences a sudden surge in resource consumption, it can starve the system pods, leading to cluster instability, DNS resolution failures, or API server unresponsiveness.
The solution is to create dedicated node pools. We’ll have one for system components and another for user applications.
First, create a node pool specifically for system components. This node pool should be configured with taints and tolerations. Taints prevent non-system pods from being scheduled onto these nodes, and tolerations allow system pods to run there.
Here’s how you’d create a system node pool using Azure CLI:
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name syspool \
--node-count 1 \
--mode System \
--node-vm-size Standard_DS2_v2 \
--labels aks-system=true \
--taints CriticalAddonsOnly=true:NoSchedule
The crucial parts here are --mode System which designates this as a system node pool, and --taints CriticalAddonsOnly=true:NoSchedule. This taint ensures that only pods with the CriticalAddonsOnly=true:NoSchedule toleration can be scheduled on these nodes. AKS system components are automatically configured with this toleration.
Next, create your user node pool. This is where your application pods will run.
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name userpool \
--node-count 3 \
--mode User \
--node-vm-size Standard_DS3_v2 \
--labels app=true
Now, if you check your nodes again, you’ll see them categorized:
NAME STATUS ROLES AGE VERSION
aks-syspool-12345678-vmss000000 Ready agent 5m v1.27.7
aks-userpool-12345678-vmss000000 Ready agent 2m v1.27.7
aks-userpool-12345678-vmss000001 Ready agent 2m v1.27.7
aks-userpool-12345678-vmss000002 Ready agent 2m v1.27.7
You can verify the taints on the system node pool:
kubectl describe node aks-syspool-12345678-vmss000000 | grep -i taints
Output:
Taints: CriticalAddonsOnly=true:NoSchedule
And check where your system pods are running:
kubectl get pods -n kube-system -o wide
You’ll see pods like coredns and kube-proxy running exclusively on nodes from the syspool. Your application pods, deployed without the CriticalAddonsOnly=true:NoSchedule toleration, will only be scheduled on the userpool nodes.
This separation guarantees that system-critical services have dedicated resources, immune to the resource demands of your applications. It’s like having a dedicated lane for emergency vehicles on a highway – it ensures essential services can always get through, even during peak traffic.
The primary benefit is enhanced stability. By isolating system components, you prevent application resource spikes from impacting cluster health. This means fewer unexpected outages and a more reliable Kubernetes environment. It also allows for independent scaling and configuration of system versus user workloads, optimizing resource utilization and cost.
When you create a new node pool, it’s automatically designated as User mode. If you don’t explicitly set --mode System and --taints CriticalAddonsOnly=true:NoSchedule, AKS will still try to schedule system components on it, but they won’t be guaranteed to stay there if other pods with higher priority or no node affinity are competing for resources.