Your Crossplane resources are stuck in a Creating or Deleting state, and the READY and SYNCED conditions are perpetually False. This means Crossplane’s control plane is trying to reconcile a desired state with the actual state of a cloud provider resource, but it’s failing to achieve consensus. The core issue is a breakdown in communication or a mismatch between Crossplane’s understanding of a resource and what the underlying cloud provider actually reports.
Here are the common culprits and how to diagnose them:
1. Cloud Provider API Throttling or Rate Limiting
The most frequent cause is hitting the API limits of your cloud provider (AWS, GCP, Azure, etc.). Crossplane, like any client, can be throttled if it makes too many requests too quickly.
- Diagnosis: Check the Crossplane controller logs. You’ll often see messages like
rateLimited,throttled,429 Too Many Requests, orexceeded rate limits.kubectl logs -l app.kubernetes.io/name=crossplane -n crossplane-system -c crossplane - Fix: This is usually a transient issue. Crossplane has built-in exponential backoff and retry mechanisms. If it persists, you might need to:
- Reduce reconciliation frequency: Edit your
ProviderConfigorProviderresource to increase thesyncIntervalfor the relevant provider. For example, for the AWS provider:apiVersion: pkg.crossplane.io/v1 kind: Provider metadata: name: provider-aws spec: package: crossplane/provider-aws:v0.40.0 controllerConfig: args: - --sync-interval - 15m # Increase from default 5m - Distribute workloads: If you have many resources, consider splitting them across multiple Crossplane instances or using multiple provider packages to spread the API load.
- Reduce reconciliation frequency: Edit your
- Why it works: Increasing the
syncIntervaltells Crossplane to poll the cloud provider less often, reducing the request volume. Spreading the load naturally distributes API calls.
2. Incorrect Cloud Provider Credentials or Permissions
Crossplane can’t do anything if it can’t authenticate with or authorize itself against your cloud provider. This is especially common after credential rotation or changes in IAM policies.
- Diagnosis: Check the
Status.Conditionsof yourProviderConfigresource. Look for conditions withType: ReadyorType: Syncedthat areFalseand have aReasonlikeLoginFailed,AccessDenied,Unauthorized, orInvalidCredentials. Also, check the Crossplane controller logs for authentication/authorization errors.kubectl describe providerconfig <your-providerconfig-name> kubectl logs -l app.kubernetes.io/name=crossplane -n crossplane-system -c crossplane - Fix:
- Verify Secret: Ensure the Kubernetes Secret referenced in your
ProviderConfig(e.g.,spec.credentials.source: SecretKeySelector) contains the correct, up-to-date credentials. - Check IAM Policies: Log into your cloud provider’s console and verify that the IAM user/role associated with the credentials has the necessary permissions to create, read, update, and delete the specific resource type Crossplane is trying to manage. For example, for an RDS instance, it needs
rds:CreateDBInstance,rds:DescribeDBInstances, etc. - Update Secret: If credentials are stale, update the Kubernetes Secret and then
kubectl applytheProviderConfigagain.
- Verify Secret: Ensure the Kubernetes Secret referenced in your
- Why it works: Valid credentials and sufficient permissions allow Crossplane to successfully authenticate and perform the required API calls to the cloud provider.
3. Cloud Provider API Errors (Non-Throttling)
The cloud provider itself might be returning specific errors that prevent resource creation or update. These aren’t about rate limits but about the validity of the request or the state of the cloud environment.
- Diagnosis: Examine the
Status.Conditionsof the specific resource that is stuck (e.g.,RDSInstance,Bucket). Look forFalseREADYorSYNCEDconditions and check theirMessageandReasonfields. These often directly quote the cloud provider’s error. Also, check the logs of the provider-specific reconciler (e.g.,provider-aws-rds,provider-gcp-compute) if you can identify it.
Example error messages:kubectl describe <your-stuck-resource-kind> <your-stuck-resource-name>InvalidParameterValue,SubnetNotFound,SecurityGroupNotFound,BucketAlreadyOwnedByYou. - Fix: The fix is entirely dependent on the error message.
InvalidParameterValue: Correct the parameters in your Custom Resource (CR) definition. For instance, if creating an RDS instance and the error is aboutDBSubnetGroupName, ensure the specified subnet group actually exists and is valid.SubnetNotFound/SecurityGroupNotFound: Ensure the referenced network resources exist in the correct VPC and region before attempting to create the Crossplane resource.BucketAlreadyOwnedByYou: If you’re trying to create a globally unique resource like an S3 bucket and it already exists and you own it, you might need to adopt it or choose a different name.
- Why it works: You’re aligning the desired state (your CR) with the actual constraints and state of the cloud provider’s environment.
4. Network Connectivity Issues
Crossplane pods need to be able to reach the cloud provider’s API endpoints. This can be an issue in restricted network environments (e.g., private clusters, strict firewalls).
- Diagnosis: From within the Crossplane pod (or a pod in the same network namespace), try to
curlthe cloud provider’s API endpoint.
Look for connection timeouts, DNS resolution failures, or SSL certificate errors.# Get a crossplane pod name POD_NAME=$(kubectl get pods -l app.kubernetes.io/name=crossplane -n crossplane-system -o jsonpath='{.items[0].metadata.name}') # Exec into the pod and try to curl kubectl exec -it $POD_NAME -n crossplane-system -- curl -v https://s3.amazonaws.com - Fix:
- Ensure Egress: Verify that your Kubernetes cluster has proper egress rules configured to allow outbound traffic to the cloud provider’s API endpoints (e.g.,
*.amazonaws.com,*.googleapis.com,*.azure.com). - Check DNS: Ensure the nodes running your Crossplane pods can resolve the cloud provider’s API endpoints.
- Proxy Configuration: If your cluster uses an HTTP proxy for egress, ensure the Crossplane pods are configured to use it (often via environment variables like
HTTP_PROXY,HTTPS_PROXY,NO_PROXYin their deployment).
- Ensure Egress: Verify that your Kubernetes cluster has proper egress rules configured to allow outbound traffic to the cloud provider’s API endpoints (e.g.,
- Why it works: Establishing reliable network paths allows the Crossplane controller to communicate with the cloud provider’s API.
5. Custom Resource Definition (CRD) or Controller Issues
Sometimes, the problem isn’t with the cloud provider but with Crossplane’s own understanding or implementation of the resource. This could be a bug in the provider package or an issue with how Crossplane itself is running.
- Diagnosis:
- Check CRD Status: Ensure the CRDs for the specific resource type are installed and healthy.
kubectl get crd <resource.group>.crossplane.io - Provider Package Health: Check the status of the
ProviderandProviderRevisionresources for the relevant provider. They should beHealthyandReady.kubectl get provider,providerrevision -l crossplane.io/provider=<provider-name> - Controller Logs: Look for panics, repeated errors, or specific messages indicating an internal issue within the Crossplane core or the provider controller.
- Check CRD Status: Ensure the CRDs for the specific resource type are installed and healthy.
- Fix:
- Update Crossplane/Provider: Ensure you are running a recent, stable version of Crossplane and the relevant provider package. Bugs are often fixed in newer releases.
- Reinstall Provider: If a
ProviderorProviderRevisionis stuck, try deleting and reapplying it.kubectl delete provider <provider-name> kubectl apply -f <your-provider-definition.yaml> - Check Crossplane Core Logs: If the issue seems general, examine the main Crossplane controller logs for systemic errors.
- Why it works: Upgrading or reinstalling ensures you have a known good version of the software components that translate your CRs into cloud provider API calls.
6. Resource Adoption/Import Issues
If you’re trying to "adopt" or import an existing cloud resource into Crossplane management, incorrect identifiers or states can lead to perpetual False conditions.
- Diagnosis: When adopting, Crossplane tries to find the existing resource. If it can’t find it (due to wrong ID, region, or name), or if it finds it but it’s in an unexpected state, reconciliation will fail. Check the
Status.Conditionsof the Crossplane resource and the Crossplane controller logs for messages likeResourceNotFoundorFailedToAdopt. - Fix:
- Verify Identifiers: Double-check the
name,id,region, and any other identifying fields in your Crossplane composite resource (e.g.,CompositeInstance) or managed resource (e.g.,RDSInstance) when configuring adoption. - Check Cloud Provider State: Ensure the resource exists in the cloud provider with the exact identifiers you’ve provided.
- Clear State: If Crossplane previously tried to manage this resource and failed, you might need to manually remove any stale state entries or ensure the resource is truly unmanaged before attempting adoption again.
- Verify Identifiers: Double-check the
- Why it works: Accurate information allows Crossplane to correctly identify and associate itself with the existing cloud resource.
After resolving these, the next error you’ll likely encounter is a Timeout error if the underlying cloud provider operation itself takes longer than Crossplane’s configured timeout for that specific operation.