Crossplane’s Managed Resources (MRs) get stuck in a "Not Ready" state because the Crossplane control plane can’t confirm the actual cloud resource managed by the MR is in the desired state.
This usually happens because the Crossplane provider (e.g., provider-aws, provider-azure) can’t communicate with the cloud API, or the cloud API is reporting an unexpected state for the resource.
Here’s how to debug:
1. Check the MR Status and Events
The most immediate information is in the Managed Resource’s status.
Diagnosis:
kubectl get managed <your-managed-resource-name> -o yaml
What to look for:
status.conditions: This array shows the health of the MR. Look for a condition withtype: Readyandstatus: "False". Themessageandreasonfields on this condition are crucial.status.atProvider: This shows the observed state of the cloud resource as reported by the provider. Compare this to your desired state in the MR’s spec.Events: Scroll to the bottom of thekubectl get ... -o yamloutput or usekubectl describe managed <your-managed-resource-name>to see recent events. These often contain provider-specific error messages.
Fix: The reason and message fields on the Ready: False condition will often point you directly to the problem. For example, a message like "failed to create RDS instance: DBInstanceAlreadyExists" means the cloud resource already exists and Crossplane is trying to create it. You might need to delete the existing cloud resource or adjust the MR’s spec to match it.
2. Verify Provider Health
The Crossplane provider itself might be unhealthy or unable to communicate with the cloud API.
Diagnosis:
kubectl get pods -l crossplane.io/provider=<provider-name> -n crossplane-system
kubectl logs <provider-pod-name> -n crossplane-system
kubectl logs <provider-pod-name> -n crossplane-system -c kube2iam # if applicable for AWS
What to look for:
- Provider Pod Status: Ensure the provider pod (e.g.,
provider-aws-abcdef1234-xyz) isRunning. If it’sCrashLoopBackOfforError, check its logs. - Provider Logs: Look for connection errors, authentication failures, API rate limiting messages, or timeouts when trying to reach the cloud provider’s API. Common messages include "dial tcp: lookup <api.region.amazonaws.com>: no such host", "context deadline exceeded", or "AccessDenied".
Fix:
- Incorrect Credentials: Ensure the
ProviderConfigreferenced by your MR has valid credentials. For AWS, check the IAM role or Secret. For Azure, check theServicePrincipalsecret.# Example: Check AWS ProviderConfig kubectl get providerconfig aws-providerconfig -o yaml # Ensure the 'credentials' secret it points to is correct kubectl get secret <credentials-secret-name> -o yaml - Network Issues: If the provider pod can’t reach the cloud API endpoint, check network policies, security groups, or VPC configurations that might be blocking egress traffic from the Crossplane pods.
- Provider Version: An outdated provider might have bugs or be incompatible with recent cloud API changes. Consider upgrading the provider.
# Example: Upgrade provider-aws kubectl apply -f https://raw.githubusercontent.com/crossplane/provider-aws/master/package/crossplane.yaml
3. Inspect Cloud API Responses
Sometimes, the provider can talk to the cloud API, but the API returns an unexpected state or error that the provider doesn’t handle gracefully.
Diagnosis:
The provider logs (from step 2) are the primary place to see the raw cloud API responses. Look for the specific API call being made (e.g., ec2.CreateVpc, aks.CreateOrUpdate) and the JSON response from the cloud.
What to look for:
- Error Codes: Cloud provider APIs return specific error codes (e.g.,
InvalidParameterValue,ResourceNotFound). - Unexpected States: The resource might exist but be in a state that Crossplane doesn’t recognize as "Ready" (e.g., a database instance in
modifyingstate for too long). - Missing Required Fields: The cloud API might have changed its response format, and the provider is expecting a field that’s no longer present or has a different name.
Fix:
- Correct Spec: Ensure your Managed Resource’s
spec.forProviderblock exactly matches what the cloud provider expects. Refer to the provider’s documentation and the cloud provider’s API reference. For instance, if a required parameter for creating an S3 bucket is missing in your MR spec, add it.apiVersion: s3.aws.upbound.io/v1beta1 kind: Bucket metadata: name: my-unique-bucket-name spec: forProvider: region: us-east-1 acl: private # Ensure all required fields like 'bucketName' (if applicable) are present - Resource Drift: If the cloud resource was modified outside of Crossplane, the MR might be out of sync. You may need to delete the MR and the cloud resource, or use
kubectl patchto update the MR’sstatus.atProviderto match the existing resource. - Provider Bugs: If you consistently see errors that seem like provider bugs, check the provider’s GitHub repository for open issues or consider opening a new one.
4. Check Cloud Provider Quotas and Limits
Cloud providers have quotas on resources that can be created in an account or region.
Diagnosis:
- Check your cloud provider’s console (AWS Service Quotas, Azure Subscriptions, GCP Quotas) for relevant limits.
- Provider logs (step 2) might contain messages like "QuotaExceeded" or "LimitExceeded".
Fix: Request a quota increase from your cloud provider for the specific resource type and region.
5. Examine Underlying Cloud Resource State
Sometimes, the MR is "Not Ready" because the actual cloud resource has an issue that the provider isn’t surfacing clearly.
Diagnosis: Log into your cloud provider’s console and navigate to the resource managed by the MR (e.g., the RDS instance, the AKS cluster, the GCS bucket). Look for any alerts, error messages, or health indicators associated with that specific resource.
Fix: Address the issue directly in the cloud provider’s console. For example, if an RDS instance shows a "Rebooting" status for an extended period, you might need to investigate its logs or initiate a manual reboot. Once the cloud resource is healthy, Crossplane should eventually reconcile and mark the MR as Ready.
6. Provider Configuration Issues
Incorrectly configured ProviderConfig can lead to a wide range of issues, including authentication, region selection, or network settings.
Diagnosis:
kubectl get providerconfig <your-provider-config-name> -o yaml
What to look for:
credentials: Ensure the secret referenced here exists and contains the correct keys (e.g.,aws_access_key_id,aws_secret_access_keyfor static credentials, or the correct IAM role ARN for IRSA/assume role).region: Verify the region specified is correct and matches where you expect the resource to be created.skipCredsValidation: Iftrue, the provider might not validate credentials on startup, masking potential issues until a reconciliation attempt.
Fix:
Correct the ProviderConfig to accurately reflect your cloud environment and credentials. For example, if you’re using IRSA on EKS, ensure the ProviderConfig is set up to leverage that.
After fixing these, the next error you’ll likely encounter is related to another resource in your composition, or perhaps a different condition on the same MR if the initial problem was only one of several.