EC2 Auto Scaling doesn’t actually scale based on CPU utilization; it scales based on the average CPU utilization across all instances in the group.
Let’s see this in action. Imagine you have an Auto Scaling Group (ASG) named my-app-asg and you want it to scale out when the average CPU utilization hits 70%.
First, you define a Target Tracking policy. This tells the ASG what metric to watch and what target value to maintain.
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGCPUUtilization"
}
}'
Here, TargetValue: 70.0 is the desired average CPU utilization. ASGCPUUtilization is the predefined metric that CloudWatch exposes specifically for ASG target tracking. When the average CPU across all instances in my-app-asg deviates from 70%, the ASG kicks in. If it goes above 70%, it adds instances. If it drops below 70% (after a cooldown period), it removes them.
The magic here is that the ASG doesn’t just look at a single instance’s CPU. It queries CloudWatch for the Average statistic of the CPUUtilization metric for that specific ASG. If you have 5 instances and their CPU is 100%, 50%, 50%, 50%, 50%, the average is 60%. The ASG sees 60% and thinks "we’re doing great, no need to scale." If the average jumps to 80%, then it’s time to add more capacity.
This approach solves the problem of reactive scaling where you might miss spikes if you only looked at individual instances. Target tracking provides a more stable and responsive scaling behavior by aiming for a specific system-wide performance level. You can also define target tracking policies for other metrics like ALBRequestCountPerTarget (average requests per target for an Application Load Balancer) or ASGNetworkIn (network traffic in).
The ASG doesn’t add or remove instances instantaneously. There’s a default cooldown period (usually 300 seconds, or 5 minutes) after a scaling activity completes before another scaling activity can begin. This prevents over-scaling or under-scaling due to transient metric fluctuations. You can adjust this cooldown period using aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-app-asg --default-cooldown 120.
Many people assume that setting a CPU target means the ASG will try to keep each instance at that CPU level. This is incorrect; it’s always about the average across the group, which is why you can have instances at 100% CPU and others at 0% and the ASG might not do anything if the average is still within the target range.
The next step is often configuring scaling policies for other metrics, like request count, to handle different types of load.