Auto-Scaling Policies: Reactive vs. Predictive

Target tracking scaling is the simplest way to keep a metric at a specific value, but it can be surprisingly rigid.

Let’s see it in action. Imagine you have a fleet of web servers behind a load balancer, and you want to keep CPU utilization around 50%.

Here’s how you’d set up target tracking for CPU utilization in AWS:

{
  "AutoScalingGroupName": "my-web-fleet",
  "TargetTrackingConfigurations": [
    {
      "TargetValue": 50.0,
      "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ASGAverageCPUUtilization"
      },
      "ScaleOutCooldown": 300,
      "ScaleInCooldown": 300,
      "EstimatedNewInstancesWarmup": 60
    }
  ]
}

When the average CPU utilization across all instances in my-web-fleet deviates from 50.0%, Auto Scaling will adjust the number of instances. If CPU goes up, it adds instances; if it goes down, it removes them. The ScaleOutCooldown and ScaleInCooldown prevent rapid, thrashing changes by enforcing a waiting period after a scaling event before another can occur. EstimatedNewInstancesWarmup tells Auto Scaling how long new instances take to become ready to receive traffic and contribute to metrics, preventing premature scaling actions based on unrepresentative data.

This strategy is great for predictable, metric-driven scaling. It’s reactive and aims to maintain a steady state.

But what if your scaling needs aren’t so linear? What if you have peak traffic hours, or sudden, dramatic spikes that require a more aggressive or scheduled response? That’s where step and scheduled scaling come in.

Step Scaling allows you to define custom scaling policies where the number of instances added or removed is based on the magnitude of the metric breach, not just a single target value.

Consider a scenario where a small spike in CPU (say, 10%) might only warrant adding one instance, but a larger spike (30%) requires adding three.

Here’s a step scaling policy:

{
  "AutoScalingGroupName": "my-web-fleet",
  "StepScalingPolicies": [
    {
      "PolicyName": "CPU-Step-Scale-Out",
      "ScalingAdjustmentType": "ChangeInCapacity",
      "StepAdjustments": [
        {
          "MetricIntervalLowerBound": 0,
          "ScalingAdjustment": 1
        },
        {
          "MetricIntervalLowerBound": 10,
          "ScalingAdjustment": 3
        },
        {
          "MetricIntervalUpperBound": 30,
          "ScalingAdjustment": 5
        }
      ],
      "Cooldown": 300,
      "MetricAlarm": {
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/EC2",
        "Statistic": "Average",
        "Period": 300,
        "EvaluationPeriods": 1,
        "Threshold": 10.0,
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "Dimensions": [
          {
            "Name": "AutoScalingGroupName",
            "Value": "my-web-fleet"
          }
        ]
      }
    }
  ]
}

In this CPU-Step-Scale-Out policy, if the average CPU utilization is 10% or more, one instance is added. If it’s 20% or more (the MetricIntervalUpperBound on the previous step is 10), three instances are added. If it’s 30% or more, five instances are added. The MetricAlarm defines the condition that triggers this policy. Notice how the bounds are inclusive/exclusive in a way that can be tricky: MetricIntervalLowerBound is inclusive, MetricIntervalUpperBound is exclusive. So a CPU of 10% triggers the first step, 20% triggers the second, and 30% triggers the third.

Step scaling is more nuanced than target tracking, allowing for more granular control over scaling actions based on the severity of metric deviations.

Scheduled Scaling is for predictable, time-based changes. Think Black Friday sales, or daily traffic patterns.

If you know your application receives 50% more traffic between 9 AM and 5 PM on weekdays, you can schedule Auto Scaling to increase capacity before the rush hits.

Here’s a scheduled scaling action:

{
  "AutoScalingGroupName": "my-web-fleet",
  "ScheduledActions": [
    {
      "ScheduledActionName": "Weekday-Morning-RampUp",
      "StartTime": "2023-10-27T08:00:00Z",
      "EndTime": "2023-10-27T17:00:00Z",
      "Recurrence": "0 8 * * MON-FRI",
      "MinSize": 5,
      "MaxSize": 20,
      "DesiredCapacity": 10
    }
  ]
}

This Weekday-Morning-RampUp action will run every weekday at 8 AM UTC. It sets the DesiredCapacity to 10 instances, and ensures the MinSize is at least 5 and MaxSize no more than 20 during this period. The Recurrence uses cron syntax. The StartTime and EndTime define the window for the specific scheduled action; if you want it to repeat, Recurrence is key.

The surprising part about scheduled scaling is how often it’s overlooked in favor of reactive strategies, even when traffic patterns are highly predictable. Many teams rely solely on metric-based scaling, missing the opportunity to proactively provision resources and avoid latency during known peak periods. This proactive approach can significantly improve user experience and reduce the load on your reactive scaling mechanisms.

Combining these strategies offers the most robust Auto Scaling solution. You might use scheduled scaling to handle predictable daily or weekly patterns, step scaling to react aggressively to sudden, large metric spikes, and target tracking for fine-grained, steady-state metric maintenance.

The next step beyond these basic Auto Scaling strategies is often to integrate them with CloudWatch Alarms for more complex event-driven scaling or to explore custom metrics for even more precise control.