AKS Azure CNI overlay mode is actually a pretty neat trick to get the best of both worlds: the simplicity of Kubernetes networking with the power of Azure’s networking infrastructure.

Let’s see it in action. Imagine you have a simple AKS cluster with a few pods. Normally, in a CNI overlay, your pods get IP addresses from a private subnet within your AKS node’s subnet. This means the pod IPs aren’t directly routable on your VNet.

Here’s a snapshot of what that looks like from inside a pod. You’d likely see an IP address from a private range, say 10.240.0.5, and your ip route command would show routes pointing to the node’s IP.

# Inside a pod
kubectl exec -it my-pod -- ip addr show eth0
# Output will show an IP like 10.240.0.5/24

kubectl exec -it my-pod -- ip route show
# Output will show routes like:
# default via 10.240.0.1 dev eth0
# 10.240.0.0/16 dev eth0 scope link src 10.240.0.5

The magic of overlay mode is that it encapsulates these pod IPs. When a pod on Node A wants to talk to a pod on Node B, the traffic isn’t just sent out with the pod IP as the source and destination. Instead, the packet is wrapped (encapsulated) within another IP packet. The outer source IP is Node A’s IP, and the outer destination IP is Node B’s IP. This outer packet is what travels across your Azure VNet. Once it arrives at Node B, the outer IP header is stripped off, and the original pod-to-pod packet is delivered. This is commonly done using VXLAN (Virtual Extensible LAN) encapsulation.

This whole process is managed by the Azure CNI plugin running on each node. It’s responsible for assigning IP addresses from the overlay network to pods and for setting up the necessary routing and encapsulation rules.

The core problem overlay mode solves is the IP address exhaustion issue that can plague traditional Azure CNI deployments when you have a very large number of pods. In the non-overlay (or "direct-server-return" mode) Azure CNI, each pod gets an IP address directly from the VNet’s subnet. This can quickly consume all available IPs if your pod density is high. Overlay mode sidesteps this by using a smaller, dedicated overlay IP address space for pods, which is then NATted behind the node’s IP for VNet transit.

When you configure your AKS cluster to use overlay mode, you’re essentially telling the Azure CNI to use this encapsulation method. You do this during cluster creation or by upgrading an existing cluster. The key parameter is network-plugin=azure. For overlay, you’ll typically see a configuration that defines a separate pod CIDR range, distinct from your VNet subnet.

For example, when creating a cluster with overlay, you might specify:

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --vnet-subnet-id <your-vnet-subnet-id> \
  --pod-cidr 10.244.0.0/16 \
  --service-cidr 10.0.0.0/24 \
  --dns-service-ip 10.0.0.10

Here, 10.244.0.0/16 is the overlay IP space for pods, completely separate from your VNet’s primary IP range. The Azure CNI plugin will then manage assigning IPs from 10.244.0.0/16 to your pods.

The internal routing within the cluster is handled by the CNI. For traffic outside the cluster, the node’s IP address is used. This means that from the perspective of your VNet, all traffic originating from pods appears to come from the IP address of the node the pod is running on. This is a form of Source Network Address Translation (SNAT).

The biggest surprise is that while you are using an overlay network, it’s not entirely isolated. Azure CNI in overlay mode still leverages the underlying Azure networking for packet transport. The encapsulation happens at the node, and the encapsulated packets are routed by Azure’s network fabric. This means you can still integrate seamlessly with other Azure services and VNets as if the pods were directly on the VNet, albeit with the added layer of encapsulation. The underlying Azure network doesn’t "know" about the pod IPs; it only sees the node IPs.

The next logical step after understanding overlay mode is to explore how Network Policies work in this environment and how they interact with the encapsulated traffic.

Want structured learning?

Take the full Aks course →