KEDA scales Kubernetes pods based on external events, but it’s not just another horizontal pod autoscaler; it’s a reactive scaling engine that can scale down to zero.
Let’s watch KEDA in action with a simple Kafka-based scaling scenario. Imagine we have a Kafka topic named my-topic and we want to scale a deployment called kafka-consumer to zero replicas when there are no messages, and to a maximum of 10 replicas when there are messages.
First, we need the KEDA operator installed in our cluster. This is typically done via Helm:
helm install keda kedacore/keda --namespace keda --create-namespace
Next, we define a ScaledObject to tell KEDA how to scale our kafka-consumer deployment. This ScaledObject links our deployment to the Kafka topic.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: default
spec:
scaleTargetRef:
name: kafka-consumer # the deployment to scale
pollingInterval: 30 # check every 30 seconds
cooldownPeriod: 300 # wait 300 seconds after scaling down before scaling down again
minReplicaCount: 0 # scale down to zero
maxReplicaCount: 10 # scale up to 10
triggers:
- type: kafka
metadata:
bootstrapServers: "kafka-broker-1:9092,kafka-broker-2:9092" # Your Kafka broker addresses
consumerGroup: "my-consumer-group" # The consumer group ID
topics: "my-topic" # The topic to monitor
# Optional: offset reset policy if consumer group is new or doesn't exist
# offsetResetPolicy: "latest"
Now, let’s create a dummy kafka-consumer deployment and service. The deployment will have 0 replicas initially.
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-consumer
namespace: default
spec:
replicas: 0 # Start with zero replicas
selector:
matchLabels:
app: kafka-consumer
template:
metadata:
labels:
app: kafka-consumer
spec:
containers:
- name: consumer
image: ubuntu:latest # Replace with your actual consumer image
command: ["/bin/sh", "-c", "sleep infinity"] # Dummy command to keep container running
With this setup, KEDA will periodically check the my-topic Kafka topic. If there are no messages in the my-consumer-group’s partition lag for my-topic, KEDA will scale the kafka-consumer deployment down to 0 replicas. When messages start appearing, KEDA will see the partition lag increase and scale the deployment up to the maxReplicaCount (10 in this case).
The magic here is that KEDA doesn’t just rely on CPU or memory. It uses event sources (like Kafka, RabbitMQ, Azure Service Bus, AWS SQS, etc.) to determine the workload. The ScaledObject acts as the bridge, defining which deployment to scale, the scaling boundaries (minReplicaCount, maxReplicaCount), and the specific event source configuration.
Internally, KEDA’s core component, the keda-operator, watches for ScaledObject resources. For each ScaledObject, it deploys a keda-metric-adapter. This adapter is a custom metrics server that implements the Kubernetes Custom Metrics API. When the Kubernetes Horizontal Pod Autoscaler (HPA) needs metrics to decide whether to scale, it queries these custom metrics adapters. KEDA’s adapters, in turn, query the configured event sources (e.g., Kafka brokers) to get the relevant metric (e.g., queue length, lag).
The pollingInterval determines how often KEDA checks the event source. The cooldownPeriod is crucial for preventing rapid scaling oscillations. After KEDA scales a deployment down, it will wait for this duration before considering another scale-down event, even if the event source becomes empty again immediately. This helps stabilize your application and avoid unnecessary scaling churn.
When KEDA scales a deployment to zero, it doesn’t just set replicas: 0. It actually deactivates the ScaledObject. This means the deployment is no longer managed by KEDA for scaling. When new events arrive, KEDA reactivates the ScaledObject and scales the deployment back up. This "scale-to-zero" capability is a key differentiator from the standard HPA, allowing for significant cost savings by not running idle pods.
One subtlety is how KEDA determines the "number of events." For Kafka, it’s typically based on the consumer group’s lag for a given topic. A lag greater than zero indicates there are messages to be processed, triggering a scale-up. When the lag returns to zero, it signals that all messages have been consumed, and KEDA can scale down. The exact metric KEDA uses depends on the trigger type; for example, for a queue, it might be the number of messages in the queue.
The next step after mastering event-driven scaling is understanding how to integrate KEDA with your CI/CD pipelines for automated deployment and scaling configuration.