Crossplane providers are just controllers that watch for Custom Resources (CRs) and translate them into API calls to external cloud providers. When you upgrade a provider, you’re upgrading that controller, and the core issue is ensuring the new controller version can correctly manage the resources created by the old version.
Common Causes and Fixes for Upgrade Issues
-
API Version Mismatch Between Provider and CRDs:
- Diagnosis: Check the installed CRDs for your provider and compare their
spec.versionswith the API versions exposed by the running provider pods.
Look for discrepancies between the CRD’skubectl get crd providers.pkg.crossplane.io -o yaml | grep -A 5 "versions:" kubectl get pods -n crossplane-system -l app.kubernetes.io/name=provider-aws -o yaml | grep "image:"servedversions and the API groups/versions the provider is registered to handle. - Fix: If the CRDs are ahead of the provider’s capabilities (e.g., new API versions are defined but not yet supported by the provider binary), you might need to manually pin the CRD to an older, supported version or wait for a provider update. If the provider is updated but the CRDs haven’t been, you may need to apply the CRD updates from the provider’s Helm chart.
# Example: If CRD defines v1beta1 but provider only supports v1 # Find the correct CRD YAML in the provider's release assets kubectl apply -f <path-to-old-crd.yaml> - Why it works: The Kubernetes API server uses CRDs to understand how to validate and route requests for custom resources. The provider controller registers itself with the API server, advertising which API versions it can handle. If these don’t align, the API server might not route requests to the controller, or the controller might not understand the resource’s structure.
- Diagnosis: Check the installed CRDs for your provider and compare their
-
Changes in Provider’s Controller Logic for Existing Resources:
- Diagnosis: Examine the provider’s release notes for the new version. Look for "breaking changes" or modifications to how specific managed resources (e.g.,
RDSInstance,Bucket) are reconciled. Check the logs of the old provider pods for reconciliation errors before the upgrade, and the new provider pods after the upgrade.kubectl logs <old-provider-pod-name> -n crossplane-system -c crossplane --previous kubectl logs <new-provider-pod-name> -n crossplane-system -c crossplane -f - Fix: If the new provider logic requires changes to the
specof your existing managed resources, you’ll need to update those resources before or immediately after the provider upgrade. This might involve removing deprecated fields or adding new, required ones.# Example: If 'engineVersion' was replaced by 'engine' in RDSInstance apiVersion: rds.aws.upbound.io/v1beta1 kind: Instance metadata: name: my-db spec: # Remove deprecated: engineVersion: "13.3" engine: name: "postgres" version: "13.3" - Why it works: Managed resources are defined by their
spec. If the controller’s interpretation of thespecchanges, it might try to update or delete resources based on a misunderstanding of their desired state. Aligning the resourcespecwith the controller’s new expectations prevents this.
- Diagnosis: Examine the provider’s release notes for the new version. Look for "breaking changes" or modifications to how specific managed resources (e.g.,
-
Underlying Dependencies (e.g., Go Modules, Cloud SDKs) Incompatibility:
- Diagnosis: Check the provider’s
go.modfile in its source repository for updated dependency versions. If the provider is built using a specific version of a cloud provider’s SDK, and that SDK has breaking changes, the provider might fail. Look for errors in the provider logs related to authentication, API calls, or specific SDK functions.kubectl logs <provider-pod-name> -n crossplane-system -c crossplane -f | grep -i "sdk\|client\|error" - Fix: This is usually fixed by updating the provider itself. If you’re building a custom provider, you’d update the SDK versions in your
go.modand ensure your code is compatible. For official providers, you must wait for a new release or fork and build it yourself. - Why it works: Cloud SDKs are the direct interface to cloud APIs. If a provider uses an older SDK version and the cloud provider deprecates an API endpoint or changes its behavior, the SDK calls will fail, and thus the Crossplane reconciliation will fail.
- Diagnosis: Check the provider’s
-
RBAC/Permissions Changes in the New Provider Version:
- Diagnosis: The provider’s
ClusterRoleandClusterRoleBindingmight have changed. Check therbac.yamlordeploy/rbac.yamlfile in the provider’s Helm chart or repository for the new version. Compare it to the existingClusterRolein your cluster.
Look for missing permissions or attempts to access resources the existingkubectl get clusterrole <provider-name>-manager-role -o yaml # Compare this with the ClusterRole definition in the new provider's chartClusterRoledoesn’t grant. - Fix: Apply the updated
ClusterRoleandClusterRoleBindingfrom the new provider version.kubectl apply -f <path-to-new-rbac.yaml> - Why it works: Crossplane providers need specific Kubernetes RBAC permissions to watch and list managed resources and to create/update/delete them. If the new provider version requires additional permissions (e.g., to manage a new resource type) or changes how it interacts with existing ones, it might fail if the
ClusterRoleisn’t updated.
- Diagnosis: The provider’s
-
Configuration or Environment Variable Differences:
- Diagnosis: Review the provider’s deployment manifest (e.g.,
deploy/deployment.yamlin its Helm chart). Look for changes inenvvariables orargspassed to the controller container. These often control things like region, endpoint overrides, or feature flags.kubectl get deployment <provider-name>-controller-manager -n crossplane-system -o yaml | grep -A 5 "env:" - Fix: Update the deployment manifest or Helm chart values to reflect any necessary changes in environment variables or arguments.
# Example: If a new required env var is introduced env: - name: MY_NEW_CONFIG_VAR value: "some-value" - Why it works: Providers often rely on environment variables for configuration. If a new version expects a different configuration or requires a new setting (e.g., to point to a different AWS endpoint for GovCloud), the controller won’t initialize or function correctly without it.
- Diagnosis: Review the provider’s deployment manifest (e.g.,
-
CRD Schema Validation Changes:
- Diagnosis: Kubernetes CRDs include schema validation. If the provider upgrade includes changes to the CRD that make existing managed resources invalid according to the new schema, reconciliation will fail. Check the CRD definition for
openAPIV3Schemaand compare it to thespecof your failing managed resources.kubectl get crd <resource-kind>.<api-group> -o yaml # Then inspect the spec of a failing resource kubectl get <resource-kind> <resource-name> -o yaml - Fix: Update the
specof your managed resources to conform to the new CRD schema. This might involve adding or changing fields as dictated by the updated validation rules. - Why it works: Kubernetes API server enforces CRD schema validation. If a managed resource’s
specviolates the schema defined in the CRD (e.g., a required field is missing, a field has the wrong type, or a value is outside an allowed enum), the API server will reject the object, and the controller won’t even receive it for reconciliation.
- Diagnosis: Kubernetes CRDs include schema validation. If the provider upgrade includes changes to the CRD that make existing managed resources invalid according to the new schema, reconciliation will fail. Check the CRD definition for
The next error you’ll likely encounter after fixing these issues is a NoResourcesFound error from the Crossplane core, indicating that the provider itself is not successfully registered with the Crossplane API, usually due to the provider controller not starting correctly.