Crossplane providers are just controllers that watch for Custom Resources (CRs) and translate them into API calls to external cloud providers. When you upgrade a provider, you’re upgrading that controller, and the core issue is ensuring the new controller version can correctly manage the resources created by the old version.

Common Causes and Fixes for Upgrade Issues

  1. API Version Mismatch Between Provider and CRDs:

    • Diagnosis: Check the installed CRDs for your provider and compare their spec.versions with the API versions exposed by the running provider pods.
      kubectl get crd providers.pkg.crossplane.io -o yaml | grep -A 5 "versions:"
      kubectl get pods -n crossplane-system -l app.kubernetes.io/name=provider-aws -o yaml | grep "image:"
      
      Look for discrepancies between the CRD’s served versions and the API groups/versions the provider is registered to handle.
    • Fix: If the CRDs are ahead of the provider’s capabilities (e.g., new API versions are defined but not yet supported by the provider binary), you might need to manually pin the CRD to an older, supported version or wait for a provider update. If the provider is updated but the CRDs haven’t been, you may need to apply the CRD updates from the provider’s Helm chart.
      # Example: If CRD defines v1beta1 but provider only supports v1
      # Find the correct CRD YAML in the provider's release assets
      kubectl apply -f <path-to-old-crd.yaml>
      
    • Why it works: The Kubernetes API server uses CRDs to understand how to validate and route requests for custom resources. The provider controller registers itself with the API server, advertising which API versions it can handle. If these don’t align, the API server might not route requests to the controller, or the controller might not understand the resource’s structure.
  2. Changes in Provider’s Controller Logic for Existing Resources:

    • Diagnosis: Examine the provider’s release notes for the new version. Look for "breaking changes" or modifications to how specific managed resources (e.g., RDSInstance, Bucket) are reconciled. Check the logs of the old provider pods for reconciliation errors before the upgrade, and the new provider pods after the upgrade.
      kubectl logs <old-provider-pod-name> -n crossplane-system -c crossplane --previous
      kubectl logs <new-provider-pod-name> -n crossplane-system -c crossplane -f
      
    • Fix: If the new provider logic requires changes to the spec of your existing managed resources, you’ll need to update those resources before or immediately after the provider upgrade. This might involve removing deprecated fields or adding new, required ones.
      # Example: If 'engineVersion' was replaced by 'engine' in RDSInstance
      apiVersion: rds.aws.upbound.io/v1beta1
      kind: Instance
      metadata:
        name: my-db
      spec:
        # Remove deprecated: engineVersion: "13.3"
        engine:
          name: "postgres"
          version: "13.3"
      
    • Why it works: Managed resources are defined by their spec. If the controller’s interpretation of the spec changes, it might try to update or delete resources based on a misunderstanding of their desired state. Aligning the resource spec with the controller’s new expectations prevents this.
  3. Underlying Dependencies (e.g., Go Modules, Cloud SDKs) Incompatibility:

    • Diagnosis: Check the provider’s go.mod file in its source repository for updated dependency versions. If the provider is built using a specific version of a cloud provider’s SDK, and that SDK has breaking changes, the provider might fail. Look for errors in the provider logs related to authentication, API calls, or specific SDK functions.
      kubectl logs <provider-pod-name> -n crossplane-system -c crossplane -f | grep -i "sdk\|client\|error"
      
    • Fix: This is usually fixed by updating the provider itself. If you’re building a custom provider, you’d update the SDK versions in your go.mod and ensure your code is compatible. For official providers, you must wait for a new release or fork and build it yourself.
    • Why it works: Cloud SDKs are the direct interface to cloud APIs. If a provider uses an older SDK version and the cloud provider deprecates an API endpoint or changes its behavior, the SDK calls will fail, and thus the Crossplane reconciliation will fail.
  4. RBAC/Permissions Changes in the New Provider Version:

    • Diagnosis: The provider’s ClusterRole and ClusterRoleBinding might have changed. Check the rbac.yaml or deploy/rbac.yaml file in the provider’s Helm chart or repository for the new version. Compare it to the existing ClusterRole in your cluster.
      kubectl get clusterrole <provider-name>-manager-role -o yaml
      # Compare this with the ClusterRole definition in the new provider's chart
      
      Look for missing permissions or attempts to access resources the existing ClusterRole doesn’t grant.
    • Fix: Apply the updated ClusterRole and ClusterRoleBinding from the new provider version.
      kubectl apply -f <path-to-new-rbac.yaml>
      
    • Why it works: Crossplane providers need specific Kubernetes RBAC permissions to watch and list managed resources and to create/update/delete them. If the new provider version requires additional permissions (e.g., to manage a new resource type) or changes how it interacts with existing ones, it might fail if the ClusterRole isn’t updated.
  5. Configuration or Environment Variable Differences:

    • Diagnosis: Review the provider’s deployment manifest (e.g., deploy/deployment.yaml in its Helm chart). Look for changes in env variables or args passed to the controller container. These often control things like region, endpoint overrides, or feature flags.
      kubectl get deployment <provider-name>-controller-manager -n crossplane-system -o yaml | grep -A 5 "env:"
      
    • Fix: Update the deployment manifest or Helm chart values to reflect any necessary changes in environment variables or arguments.
      # Example: If a new required env var is introduced
      env:
      - name: MY_NEW_CONFIG_VAR
        value: "some-value"
      
    • Why it works: Providers often rely on environment variables for configuration. If a new version expects a different configuration or requires a new setting (e.g., to point to a different AWS endpoint for GovCloud), the controller won’t initialize or function correctly without it.
  6. CRD Schema Validation Changes:

    • Diagnosis: Kubernetes CRDs include schema validation. If the provider upgrade includes changes to the CRD that make existing managed resources invalid according to the new schema, reconciliation will fail. Check the CRD definition for openAPIV3Schema and compare it to the spec of your failing managed resources.
      kubectl get crd <resource-kind>.<api-group> -o yaml
      # Then inspect the spec of a failing resource
      kubectl get <resource-kind> <resource-name> -o yaml
      
    • Fix: Update the spec of your managed resources to conform to the new CRD schema. This might involve adding or changing fields as dictated by the updated validation rules.
    • Why it works: Kubernetes API server enforces CRD schema validation. If a managed resource’s spec violates the schema defined in the CRD (e.g., a required field is missing, a field has the wrong type, or a value is outside an allowed enum), the API server will reject the object, and the controller won’t even receive it for reconciliation.

The next error you’ll likely encounter after fixing these issues is a NoResourcesFound error from the Crossplane core, indicating that the provider itself is not successfully registered with the Crossplane API, usually due to the provider controller not starting correctly.

Want structured learning?

Take the full Crossplane course →