Crossplane’s upgrade process is designed to be non-disruptive, but a recent change in how certain providers handle their reconciliation loops can lead to brief interruptions if not managed carefully.
The core issue arises from Crossplane’s control plane components (like the crossplane deployment itself and its associated RBAC) being updated while the provider controllers (e.g., provider-aws, provider-azure) are still actively reconciling your managed resources. If the new Crossplane version expects a different API version or field for a resource that a provider hasn’t yet adapted to, or if a provider’s reconciliation logic times out during this brief window, your managed resources might enter an unready state for a short period. This isn’t a failure of Crossplane itself, but a temporary desync between the control plane’s expectations and the provider’s current state.
Here’s how to navigate Crossplane upgrades with minimal to zero impact:
1. Stagger Provider Upgrades First
Diagnosis: Check the currently installed Crossplane and provider versions:
kubectl get deployment crossplane -n crossplane-system -o jsonpath='{.spec.template.spec.containers[?(@.name=="crossplane")].image}'
kubectl get pods -n crossplane-system -l app.kubernetes.io/component=provider -o jsonpath='{range .items[*]}{.spec.containers[?(@.name=="provider")].image}{"\n"}{end}' | sort | uniq
Fix: Before upgrading the crossplane deployment, upgrade all your installed providers to their latest compatible versions. This ensures that the provider controllers are already running the newest logic that aligns with potential upcoming changes in the Crossplane core.
# Example for provider-aws
helm upgrade --install crossplane-provider-aws crossplane-charts/crossplane-provider-aws \
--namespace crossplane-system \
--version 1.17.0 \
--set controllerConfig.image.tag=v1.17.0
Why it works: By upgrading providers first, you give them a chance to adapt to any subtle API shifts or new reconciliation strategies before the core Crossplane components that interact with them are updated. This pre-empts the "expectations mismatch" scenario.
2. Upgrade Crossplane Core Components
Diagnosis: Verify the current Crossplane core image version.
Fix: Once providers are updated, upgrade the crossplane deployment. The recommended method is via Helm.
helm upgrade --install crossplane crossplane-charts/crossplane \
--namespace crossplane-system \
--version 1.15.2
Why it works: This updates the core Crossplane controllers, including the API server and the webhook, to the new version. Because the providers are already on their latest versions, they are best equipped to handle any new API expectations or reconciliation behaviors introduced by the core.
3. Monitor Provider Health Post-Upgrade
Diagnosis: After upgrading Crossplane core, closely watch the status of your provider pods and any associated logs for errors.
kubectl get pods -n crossplane-system -l app.kubernetes.io/component=provider -w
kubectl logs -n crossplane-system <provider-pod-name> -c provider
Fix: If a provider pod restarts or shows errors, check its logs for specific reconciliation failures. Often, this indicates a resource that was in a transitional state during the upgrade. A simple kubectl delete on the problematic managed resource (e.g., an RDSInstance in us-east-1) will trigger Crossplane to re-reconcile it from scratch with the updated provider logic. Do NOT delete the claim or composite resource.
# Example: If an RDSInstance in a specific region is failing
kubectl delete rdsinstance.rds.aws.upbound.io my-database-instance -n my-namespace
Why it works: Deleting the specific managed resource instance forces the provider to create it anew. With both Crossplane core and the provider updated, the new instance is created with the latest desired state and reconciliation logic, resolving the temporary desync.
4. Leverage kubectl debug for Deep Dives
Diagnosis: If a managed resource remains stuck or shows persistent errors after the upgrade, you might need to inspect the provider’s internal state or logs more directly.
Fix: Use kubectl debug to get an interactive shell into the failing provider pod.
kubectl debug -n crossplane-system -it <provider-pod-name> --image=nicolaka/netshoot --target=provider
Once inside, you can use tools like curl to inspect the provider’s health endpoints or tcpdump to analyze network traffic if you suspect connectivity issues with the cloud provider API.
Why it works: This provides a powerful, ephemeral debugging environment without altering the running pod, allowing for detailed inspection of the provider’s operational environment and its interactions with the cloud API.
5. Understand Provider-Specific Upgrade Notes
Diagnosis: Review the release notes for each provider you use.
Fix: Pay close attention to any sections detailing breaking changes, API version updates, or specific upgrade instructions. Some providers might have specific prerequisites or post-upgrade checks recommended.
Why it works: Cloud provider APIs evolve, and Crossplane providers must keep pace. These notes are the authoritative source for any provider-specific nuances that could affect your upgrade strategy beyond the general Crossplane process.
6. Graceful Shutdown and Leader Election
Diagnosis: Observe Crossplane and provider pod readiness and liveness probes during the upgrade.
Fix: Ensure your Crossplane and provider deployments have appropriate terminationGracePeriodSeconds set (default is 30s). This allows existing reconciliation loops to finish gracefully before pods are terminated.
# Example in crossplane deployment manifest
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Increased from default
Why it works: A longer grace period gives active reconciliation processes more time to complete, reducing the chance of them being abruptly interrupted mid-operation when the new version of the deployment is rolled out.
By following these steps, you can systematically upgrade Crossplane and its providers, ensuring that your managed cloud resources remain stable and available throughout the process. The most common next hurdle after a successful upgrade is often ensuring that newly provisioned resources adhere to updated compliance policies, which might require adjustments to your Composite Resource Definitions (XRDs).