Terraform’s azurerm_kubernetes_cluster resource is a beast, and getting it right often feels like wrestling an octopus.

Your AKS cluster isn’t starting, and the error message is cryptic. This means the Azure API, which Terraform is talking to, is refusing to create the Kubernetes cluster because it doesn’t have enough information or the information provided is invalid. Terraform itself isn’t the problem; it’s just the messenger reporting Azure’s rejection.

Here’s why Azure might be saying "no":

  • Resource Group Doesn’t Exist or is in the Wrong Region: Azure needs a place to put your AKS cluster. If the resource_group_name you specified in your Terraform config doesn’t exist in the location you’ve chosen, or if the resource group exists but is in a different region than your AKS cluster, creation will fail.

    • Diagnosis:
      az group show --name <your-resource-group-name> --query "location" -o tsv
      
      Compare this output to the location you’ve set in your azurerm_kubernetes_cluster resource.
    • Fix: Ensure the resource_group_name in your azurerm_kubernetes_cluster resource points to an existing resource group and that the resource group’s location matches the location argument in your AKS resource. If the resource group doesn’t exist, create it first with resource "azurerm_resource_group" "main" { ... } in your Terraform code.
    • Why it works: Azure requires all resources to reside within a resource group, and that resource group must be in the same region as the resource being created.
  • Invalid Kubernetes Version: AKS supports a specific range of Kubernetes versions. If you try to provision a version that’s deprecated, not yet generally available, or simply doesn’t exist, Azure will reject the request.

    • Diagnosis:
      az aks get-versions --location <your-location> --output table
      
      Check this output for the orchestratorVersion you’ve specified in your azurerm_kubernetes_cluster resource.
    • Fix: Update the kubernetes_version argument in your azurerm_kubernetes_cluster resource to a valid, supported version from the az aks get-versions output. For example, set it to "1.27.3".
    • Why it works: The Azure API validates the requested Kubernetes version against its supported catalog. Using a valid version ensures compatibility.
  • Network Plugin Mismatch or Misconfiguration: AKS can use kubenet or azure (Azure CNI) network plugins. If you specify a particular network profile configuration that’s incompatible with the chosen plugin, or if the underlying VNet/subnet configuration is incorrect, it will fail. For azure CNI, the subnet must have enough available IP addresses.

    • Diagnosis: Examine the network_profile block in your azurerm_kubernetes_cluster resource. If using azure CNI, check the CIDR ranges and available IPs in your VNet/subnet.
    • Fix:
      • If using kubenet: Ensure network_plugin = "kubenet" and load_balancer_sku = "standard".
      • If using azure CNI: Ensure network_plugin = "azure". Critically, the subnet you’ve associated with the AKS cluster must have enough free IP addresses for nodes and pods. The minimum required is often (number of nodes * (1 + max pods per node)) + 1. If your subnet is too small, you’ll need to resize it or use a different subnet. For example, a subnet with /24 might be too small for a large cluster. You might need to use a /22 or larger.
    • Why it works: The kubenet plugin relies on Azure Load Balancer for service IPs and uses overlay networks for pod communication. azure CNI assigns IPs directly from the VNet’s subnet to pods, requiring more IP address space.
  • Service Principal or Managed Identity Permissions: The identity Terraform uses to interact with Azure (either a Service Principal or a User Assigned Managed Identity) needs specific permissions on the resource group and any associated resources (like VNet, Load Balancer). Insufficient permissions will cause the API call to be rejected.

    • Diagnosis: Check the role assignments for your Service Principal or Managed Identity. It typically needs contributor or aks service principal role on the resource group, and potentially network contributor on the VNet if it’s managed separately.
    • Fix: Grant the necessary roles. For example, if using a Service Principal:
      az role assignment create --assignee <service-principal-app-id> \
                                --role "Aks Service Principal" \
                                --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group-name>
      az role assignment create --assignee <service-principal-app-id> \
                                --role "Network Contributor" \
                                --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Network/<your-vnet-name>
      
    • Why it works: Azure RBAC (Role-Based Access Control) is enforced at the API level. The identity must have the explicit permissions to perform the requested actions (creating VMs, configuring networking, etc.).
  • Missing Required Addons or Configurations: Certain AKS configurations, like enabling the Azure Policy addon or using specific features, might require additional parameters or might be incompatible with other settings.

    • Diagnosis: Review the addon_profile and other configuration blocks in your azurerm_kubernetes_cluster resource. Are you enabling an addon that requires specific network configurations or permissions?
    • Fix: Ensure all required parameters for enabled addons are correctly set. For example, if azure_policy_enabled = true, ensure the necessary RBAC roles are assigned to the AKS identity for policy management. If you’re using enable_private_cluster = true, you’ll need to configure private_dns_zone_id.
    • Why it works: Addons often extend the functionality of AKS by integrating with other Azure services, which necessitates specific permissions and configurations for those integrations to function.
  • Azure Subscription Quotas: While less common for initial creation, hitting subscription quotas for resources like Public IPs, VNETs, or even VMs in a region can cause provisioning to fail.

    • Diagnosis: Check your Azure subscription’s Quotas in the Azure portal for the relevant region. Look for limits on compute, networking, or core counts.
    • Fix: Request a quota increase through the Azure portal.
    • Why it works: Azure enforces resource limits at the subscription level to ensure fair usage and stability.

The next error you’ll likely encounter is a azurerm_kubernetes_cluster_node_pool failure, often due to insufficient IP addresses in the subnet or incorrect VM size availability in the region.

Want structured learning?

Take the full Aks course →