Cluster Autoscaler on EKS is a surprisingly simple way to manage your Kubernetes cluster’s compute resources, but it often gets complicated by unexpected interactions with AWS EC2 Auto Scaling Groups.
Let’s see it in action. Imagine you have a Kubernetes deployment like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
resources:
limits:
cpu: "500m"
memory: "128Mi"
requests:
cpu: "250m"
memory: "64Mi"
When you apply this, Kubernetes schedules the pods. If your existing nodes don’t have enough capacity (CPU or memory requests exceeding available resources), the pods will remain in a Pending state.
This is where Cluster Autoscaler (CA) comes in. It watches for Pending pods and checks if they can be scheduled if more nodes were available. If so, it tells the underlying infrastructure (in this case, AWS EC2 Auto Scaling Groups) to launch new nodes. Once the new nodes are ready, CA reschedules the Pending pods onto them. Conversely, if nodes are underutilized for a configurable period, CA will drain them and instruct the ASG to terminate them.
Here’s a simplified view of the CA deployment manifest. The key is the cluster-autoscaler deployment, which runs as a pod in your cluster.
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.1 # Use a version matching your K8s version
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --aws-vpc-tags=kubernetes.io/cluster/your-cluster-name=owned
- --skip-nodes-with-local-storage=false
- --nodes=1:5 # Example: min 1 node, max 5 nodes
env:
- name: AWS_CLUSTER_NAME
value: "your-cluster-name" # Replace with your EKS cluster name
- name: AWS_REGION
value: "us-east-1" # Replace with your AWS region
volumeMounts:
- name: k8s-certs
mountPath: /etc/kubernetes/
- name: aws-creds
mountPath: /etc/aws/
volumes:
- name: k8s-certs
hostPath:
path: /etc/kubernetes/
- name: aws-creds
secret:
secretName: aws-cluster-autoscaler-secret # This Secret needs to be created
The cluster-autoscaler command-line arguments are crucial.
--cloud-provider=aws: Tells CA it’s talking to AWS.--aws-vpc-tags: This is how CA discovers which EC2 Auto Scaling Groups belong to your cluster. The tagkubernetes.io/cluster/your-cluster-name=ownedis applied to your ASGs when EKS creates them. CA looks for ASGs with this tag.--nodes=1:5: Defines the scaling limits for the ASG it manages.AWS_CLUSTER_NAMEandAWS_REGION: These environment variables are used by the AWS cloud provider to authenticate and identify your cluster.
To make this work, CA needs AWS credentials. You’ll typically create an IAM role for the EKS cluster’s service account that CA runs under. This role needs permissions to describe and modify EC2 instances and Auto Scaling Groups. The aws-cluster-autoscaler-secret is usually populated with these credentials, or if you’re using IAM roles for service accounts (IRSA), the secret might not be strictly necessary if the pod’s service account has the correct IAM permissions.
The Autoscaler continuously monitors Pending pods and the utilization of your EC2 nodes. It communicates with the AWS API to adjust the desired capacity of the EC2 Auto Scaling Groups associated with your EKS cluster. The key is that CA doesn’t create EC2 instances directly; it tells the ASG how many instances it wants, and the ASG handles the actual EC2 provisioning.
When CA decides to scale down, it drains nodes by cordoning them (marking them unschedulable) and then evicting pods gracefully. It waits for pods to terminate before signaling the ASG to reduce its desired capacity. This ensures minimal disruption to your running applications.
The skip-nodes-with-local-storage=false flag is important. If set to true (which is the default in some older versions), CA might refuse to scale down nodes that have emptyDir volumes, as these are tied to the node’s lifecycle. Setting it to false allows CA to consider scaling down such nodes, but you need to be aware that pods using emptyDir will be lost when the node is terminated.
One aspect often overlooked is the interaction between your ASG’s minSize and maxSize and CA’s --nodes flag. CA will respect the ASG’s limits. If your ASG’s maxSize is 3, and CA’s --nodes flag is set to 1:5, CA will only scale up to 3. If CA’s --nodes flag is set to 1:2 and your ASG’s maxSize is 5, CA will only scale up to 2. It’s best to align these values or ensure CA’s limits are more restrictive if you want finer control.
The next thing you’ll likely grapple with is how to configure pod disruption budgets to prevent CA from scaling down nodes too aggressively and disrupting your applications.