AWS Auto Scaling policies don’t just react to load; they actively sculpt your infrastructure to meet future demand.

Imagine you’ve got a fleet of EC2 instances running your web application behind a load balancer. You want to make sure it stays responsive even when traffic spikes, but you don’t want to pay for idle servers during lulls. Auto Scaling is your tool for this. Let’s see it in action with a Target Tracking policy.

Here’s a simplified setup:

{
  "AutoScalingGroupName": "my-web-app-asg",
  "LaunchTemplate": {
    "LaunchTemplateId": "lt-012345abcdef67890",
    "Version": "$Latest"
  },
  "MinSize": 2,
  "MaxSize": 10,
  "DesiredCapacity": 2,
  "VPCZoneIdentifier": "subnet-xxxxxxxxxxxxxxxxx,subnet-yyyyyyyyyyyyyyyyy",
  "TargetGroupARNs": [
    "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-web-app-tg/abcdef1234567890"
  ],
  "HealthCheckType": "ELB",
  "Tags": [
    {
      "Key": "Name",
      "Value": "my-web-app-instance"
    }
  ]
}

And here’s a Target Tracking policy attached to this Auto Scaling group:

{
  "AutoScalingGroupName": "my-web-app-asg",
  "PolicyName": "my-target-tracking-policy",
  "TargetTrackingConfiguration": {
    "TargetValue": 60.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ALBRequestCountPerTarget",
      "ResourceLabel": "app/my-web-app-tg/abcdef1234567890"
    },
    "ScaleOutCooldown": 300,
    "ScaleInCooldown": 300
  }
}

With this configuration, AWS will monitor the ALBRequestCountPerTarget metric for your application load balancer’s target group. If the average number of requests per target instance goes above 60, Auto Scaling will launch new instances. If it drops below 60 for a sustained period, it will terminate instances. The ResourceLabel is crucial here; it tells Auto Scaling which specific target group to monitor. The ScaleOutCooldown and ScaleInCooldown prevent rapid scaling actions by waiting 300 seconds (5 minutes) after a scaling activity before initiating another.

This system solves the problem of maintaining application performance under variable load without manual intervention. It achieves this by decoupling the decision-making process (the policy) from the execution (the Auto Scaling group modifying instance counts). The "brain" is the policy, which watches specific metrics and decides when and how to adjust the "body" – the Auto Scaling group.

There are three main types of policies:

  1. Target Tracking Scaling: This is what we saw above. You define a target value for a specific metric (like CPU utilization, request count per target, or network in/out). Auto Scaling automatically creates and manages the CloudWatch alarms and scaling policies needed to keep the metric at or near your target value. It’s the simplest to set up and often the most effective for general load-based scaling.

  2. Step Scaling: This policy allows for more granular control. Instead of a single target value, you define a series of "steps." For example, if CPU utilization is between 50% and 70%, add 2 instances. If it’s between 70% and 90%, add 5 instances. If it’s above 90%, add 10 instances. You also define cooldown periods for each step. This is useful when you want different scaling actions based on the severity of the load.

    Here’s a Step Scaling policy example:

    {
      "AutoScalingGroupName": "my-web-app-asg",
      "PolicyName": "my-step-scaling-policy",
      "StepScalingPolicyConfiguration": {
        "AdjustmentType": "ChangeInCapacity",
        "MetricAggregationType": "Average",
        "StepAdjustments": [
          {
            "MetricIntervalLowerBound": 0,
            "MetricIntervalUpperBound": 50,
            "ScalingAdjustment": -1
          },
          {
            "MetricIntervalLowerBound": 50,
            "MetricIntervalUpperBound": 70,
            "ScalingAdjustment": 2
          },
          {
            "MetricIntervalLowerBound": 70,
            "MetricIntervalUpperBound": 100,
            "ScalingAdjustment": 5
          }
        ],
        "Cooldown": 300
      }
    }
    

    In this example, if the CPUUtilization metric (which you’d need to configure a CloudWatch alarm for separately, linked to this policy) is between 0 and 50, it scales in by 1 instance. Between 50 and 70, it scales out by 2 instances. Above 70, it scales out by 5 instances. The AdjustmentType can be ChangeInCapacity (add/remove a fixed number of instances) or PercentChangeInCapacity (add/remove a percentage of the current capacity).

  3. Scheduled Scaling: This policy allows you to set a predictable scaling schedule. For instance, you know every weekday at 9 AM EST, traffic to your application increases significantly. You can schedule Auto Scaling to increase the DesiredCapacity of your Auto Scaling group to 10 instances at that time, and then decrease it back to 4 instances at 5 PM EST. This is perfect for recurring, predictable traffic patterns.

    Here’s a Scheduled Scaling example:

    {
      "AutoScalingGroupName": "my-web-app-asg",
      "ScheduledActionName": "my-weekday-scaling",
      "StartTime": "2023-10-27T09:00:00Z",
      "EndTime": "2023-10-27T17:00:00Z",
      "Recurrence": "0 9 * * 1-5",
      "MinSize": 4,
      "MaxSize": 10,
      "DesiredCapacity": 10
    }
    

    This action will run every weekday (Monday to Friday) at 9 AM UTC (note: the Recurrence uses cron syntax, 0 9 * * 1-5 means at minute 0 of hour 9, on any day of the month, any month, Monday through Friday). It will set the DesiredCapacity to 10. The StartTime and EndTime define the specific window for this schedule on a given date, and the Recurrence defines how often it repeats.

A common pitfall is forgetting to set appropriate cooldown periods. Without them, especially with Step Scaling, you can trigger a "flapping" behavior where Auto Scaling rapidly adds and removes instances, leading to instability and unexpected costs. The cooldown period ensures that Auto Scaling waits for a scaling activity to complete and for the system to stabilize before considering another adjustment.

The next thing you’ll likely grapple with is handling application health checks effectively, as both the load balancer and Auto Scaling rely on them to determine if an instance is ready to receive traffic or should be terminated.

Want structured learning?

Take the full Aws course →