Crossplane’s providers are surprisingly resilient to misconfiguration, often failing silently or with generic errors that mask the real issue.
Let’s see how a Crossplane provider actually runs. Imagine we have a Provider resource and a ProviderConfig for AWS.
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws
spec:
package: xpkg.upbound.io/upbound/provider-aws:v0.48.0 # Example package
runtimeConfigReference:
name: aws-runtime-config # This is where we'll tune
---
apiVersion: aws.provider.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
name: default
spec:
credentials {
source: Secret
secretRef:
namespace: crossplane-system
name: aws-credentials
key: credentials
}
region: us-east-1
The Provider resource tells Crossplane what provider package to pull and run. The runtimeConfigReference points to a ProviderConfig that defines how that provider should run. This ProviderConfig is where the magic happens for tuning.
Here’s what the aws-runtime-config might look like:
apiVersion: pkg.crossplane.io/v1
kind: ProviderConfig
metadata:
name: aws-runtime-config
spec:
package: xpkg.upbound.io/upbound/provider-aws:v0.48.0 # Must match the Provider resource
controllerConfig:
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "200m"
memory: "512Mi"
args:
- --log-level=debug
- --metrics-bind-address=0.0.0.0:8080
- --leader-election-enabled=true
- --leader-election-namespace=crossplane-system
- --leader-election-id=provider-aws-leader-election
# For specific configurations, you might also see:
# tlsConfig:
# caCertSecretRef:
# namespace: crossplane-system
# name: ca-cert
# key: ca.crt
# serverCertSecretRef:
# namespace: crossplane-system
# name: server-cert
# key: tls.crt
# serverKeySecretRef:
# namespace: crossplane-system
# name: server-key
# key: tls.key
This ProviderConfig resource is the key. It defines the runtime environment for the provider’s controller.
package: This must match thespec.packagein theProviderresource. It tells Crossplane which provider image to deploy.controllerConfig.resources: This is pure Kubernetes resource requests and limits. Setting these correctly is crucial for preventing resource starvation or runaway consumption. For a production environment, you’d want to set these based on the expected load and the provider’s complexity. Forprovider-aws, with its many services, you might start withrequests: { cpu: "200m", memory: "512Mi" }andlimits: { cpu: "500m", memory: "1Gi" }, then adjust based on monitoring.args: These are command-line arguments passed directly to the provider’s controller binary.--log-level: Essential for debugging.debugis verbose,infois standard,warnorerrorare for quieter operation.--metrics-bind-address: Exposes Prometheus metrics for the provider.0.0.0.0:8080makes them accessible.--leader-election-enabled: For high availability. If set totrue, the provider will participate in leader election, ensuring only one instance is active if multiple are deployed.--leader-election-namespaceand--leader-election-id: Define how the leader election is managed. Thenamespaceshould be where your Crossplane control plane components reside, typicallycrossplane-system. Theidshould be unique per provider.
tlsConfig: If your Crossplane installation uses mTLS for communication between components, you’ll configure the provider’s certificates here.caCertSecretRef,serverCertSecretRef, andserverKeySecretRefall point to Kubernetes Secrets containing the necessary certificate material.
The mental model here is that each Provider resource is a deployment of a specific provider’s controller. The ProviderConfig resource associated with that Provider resource dictates how that deployment runs. Crossplane doesn’t just run a generic container; it creates a Kubernetes Deployment for each provider it manages, and the ProviderConfig directly influences the Pod template of that Deployment.
What often gets overlooked is the interaction between the provider’s concurrency settings and Kubernetes resource limits. When a provider controller is managing a large number of resources or complex resources (like many EC2 instances or RDS databases), it can become CPU or memory bound. If its Kubernetes limits are too low, the Pod might be throttled or even OOMKilled. Conversely, if requests are too low and limits are high, it might hog cluster resources. The args for leader election are critical for production; without them, if your provider Pod restarts, it might take a few moments for it to re-establish leadership and start reconciling resources, leading to brief control plane unavailability.
The --log-level argument is your primary tool for understanding what a provider is doing. For instance, if you see a resource stuck in a reconciling state, switching the provider’s ProviderConfig to --log-level=debug and observing the logs will reveal the exact API calls it’s making and any errors it’s encountering from the cloud provider.
A common subtle issue is when a provider package is updated, and the new version has different default resource requirements or behavior. You might need to adjust the controllerConfig.resources in your ProviderConfig to match the new version’s needs, or even update the --log-level to diagnose unexpected behavior post-upgrade.
If you’re seeing intermittent reconciliation failures for a provider and have leader election enabled, check the logs for messages related to leader election. If you see frequent "not the leader" messages or errors about acquiring locks, it might indicate network issues or etcd performance problems affecting the Kubernetes control plane, which then impacts the provider’s HA.
The most impactful tuning often comes from understanding the specific cloud provider API and its rate limits. For example, AWS has API request rate limits. If your Crossplane provider is making too many requests too quickly, it will start receiving ThrottlingException errors. You can’t directly configure per-request throttling in the ProviderConfig, but you can influence the provider’s behavior by adjusting the reconciliation interval (which is a Crossplane global setting, not provider-specific) or by carefully managing the number of resources the provider is responsible for.
The --leader-election-id is not just a string; it’s a key used by the Kubernetes leader election mechanism. If you have multiple instances of the same provider trying to run with the same leader-election-id, they will fight for leadership and neither will likely succeed. Ensure this ID is unique to the specific provider deployment.