Terraform’s azurerm_kubernetes_cluster resource is a beast, and getting it right often feels like wrestling an octopus.
Your AKS cluster isn’t starting, and the error message is cryptic. This means the Azure API, which Terraform is talking to, is refusing to create the Kubernetes cluster because it doesn’t have enough information or the information provided is invalid. Terraform itself isn’t the problem; it’s just the messenger reporting Azure’s rejection.
Here’s why Azure might be saying "no":
-
Resource Group Doesn’t Exist or is in the Wrong Region: Azure needs a place to put your AKS cluster. If the
resource_group_nameyou specified in your Terraform config doesn’t exist in thelocationyou’ve chosen, or if the resource group exists but is in a different region than your AKS cluster, creation will fail.- Diagnosis:
Compare this output to theaz group show --name <your-resource-group-name> --query "location" -o tsvlocationyou’ve set in yourazurerm_kubernetes_clusterresource. - Fix: Ensure the
resource_group_namein yourazurerm_kubernetes_clusterresource points to an existing resource group and that the resource group’s location matches thelocationargument in your AKS resource. If the resource group doesn’t exist, create it first withresource "azurerm_resource_group" "main" { ... }in your Terraform code. - Why it works: Azure requires all resources to reside within a resource group, and that resource group must be in the same region as the resource being created.
- Diagnosis:
-
Invalid Kubernetes Version: AKS supports a specific range of Kubernetes versions. If you try to provision a version that’s deprecated, not yet generally available, or simply doesn’t exist, Azure will reject the request.
- Diagnosis:
Check this output for theaz aks get-versions --location <your-location> --output tableorchestratorVersionyou’ve specified in yourazurerm_kubernetes_clusterresource. - Fix: Update the
kubernetes_versionargument in yourazurerm_kubernetes_clusterresource to a valid, supported version from theaz aks get-versionsoutput. For example, set it to"1.27.3". - Why it works: The Azure API validates the requested Kubernetes version against its supported catalog. Using a valid version ensures compatibility.
- Diagnosis:
-
Network Plugin Mismatch or Misconfiguration: AKS can use
kubenetorazure(Azure CNI) network plugins. If you specify a particular network profile configuration that’s incompatible with the chosen plugin, or if the underlying VNet/subnet configuration is incorrect, it will fail. ForazureCNI, the subnet must have enough available IP addresses.- Diagnosis: Examine the
network_profileblock in yourazurerm_kubernetes_clusterresource. If usingazureCNI, check the CIDR ranges and available IPs in your VNet/subnet. - Fix:
- If using
kubenet: Ensurenetwork_plugin = "kubenet"andload_balancer_sku = "standard". - If using
azureCNI: Ensurenetwork_plugin = "azure". Critically, the subnet you’ve associated with the AKS cluster must have enough free IP addresses for nodes and pods. The minimum required is often(number of nodes * (1 + max pods per node)) + 1. If your subnet is too small, you’ll need to resize it or use a different subnet. For example, a subnet with/24might be too small for a large cluster. You might need to use a/22or larger.
- If using
- Why it works: The
kubenetplugin relies on Azure Load Balancer for service IPs and uses overlay networks for pod communication.azureCNI assigns IPs directly from the VNet’s subnet to pods, requiring more IP address space.
- Diagnosis: Examine the
-
Service Principal or Managed Identity Permissions: The identity Terraform uses to interact with Azure (either a Service Principal or a User Assigned Managed Identity) needs specific permissions on the resource group and any associated resources (like VNet, Load Balancer). Insufficient permissions will cause the API call to be rejected.
- Diagnosis: Check the role assignments for your Service Principal or Managed Identity. It typically needs
contributororaks service principalrole on the resource group, and potentiallynetwork contributoron the VNet if it’s managed separately. - Fix: Grant the necessary roles. For example, if using a Service Principal:
az role assignment create --assignee <service-principal-app-id> \ --role "Aks Service Principal" \ --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group-name> az role assignment create --assignee <service-principal-app-id> \ --role "Network Contributor" \ --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Network/<your-vnet-name> - Why it works: Azure RBAC (Role-Based Access Control) is enforced at the API level. The identity must have the explicit permissions to perform the requested actions (creating VMs, configuring networking, etc.).
- Diagnosis: Check the role assignments for your Service Principal or Managed Identity. It typically needs
-
Missing Required Addons or Configurations: Certain AKS configurations, like enabling the Azure Policy addon or using specific features, might require additional parameters or might be incompatible with other settings.
- Diagnosis: Review the
addon_profileand other configuration blocks in yourazurerm_kubernetes_clusterresource. Are you enabling an addon that requires specific network configurations or permissions? - Fix: Ensure all required parameters for enabled addons are correctly set. For example, if
azure_policy_enabled = true, ensure the necessary RBAC roles are assigned to the AKS identity for policy management. If you’re usingenable_private_cluster = true, you’ll need to configureprivate_dns_zone_id. - Why it works: Addons often extend the functionality of AKS by integrating with other Azure services, which necessitates specific permissions and configurations for those integrations to function.
- Diagnosis: Review the
-
Azure Subscription Quotas: While less common for initial creation, hitting subscription quotas for resources like Public IPs, VNETs, or even VMs in a region can cause provisioning to fail.
- Diagnosis: Check your Azure subscription’s Quotas in the Azure portal for the relevant region. Look for limits on compute, networking, or core counts.
- Fix: Request a quota increase through the Azure portal.
- Why it works: Azure enforces resource limits at the subscription level to ensure fair usage and stability.
The next error you’ll likely encounter is a azurerm_kubernetes_cluster_node_pool failure, often due to insufficient IP addresses in the subnet or incorrect VM size availability in the region.