Auto-Scale ECS Services Based on CPU, Memory, or Custom Metrics (2026)

ECS services can scale themselves up and down based on demand, ensuring you’re not overpaying for idle resources or underprovisioned during peaks.

Let’s see it in action. Imagine a web service running on ECS that experiences fluctuating traffic. We want it to automatically add more containers when traffic spikes and remove them when it dips.

Here’s a simplified task-definition.json for our web service:

{
  "family": "my-web-service",
  "containerDefinitions": [
    {
      "name": "web-app",
      "image": "nginx:latest",
      "cpu": 256,
      "memory": 512,
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80
        }
      ],
      "essential": true
    }
  ]
}

And here’s a basic service.json configuration:

{
  "cluster": "my-ecs-cluster",
  "serviceName": "my-web-service",
  "taskDefinition": "my-web-service:1",
  "desiredCount": 1,
  "launchType": "EC2",
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-web-tg/abcdef1234567890",
      "containerName": "web-app",
      "containerPort": 80
    }
  ],
  "schedulingStrategy": "REPLICA"
}

When this service runs, ECS will launch desiredCount instances of our taskDefinition. If we want this to scale, we need to hook it up to Application Auto Scaling.

The core idea is that Application Auto Scaling watches metrics, and when those metrics cross a defined threshold, it tells ECS to adjust the desiredCount of the service. This adjustment is done by creating or removing tasks from the service.

To set this up, you’ll define a Scaling Policy within Application Auto Scaling. This policy links to your ECS service and specifies what metric to watch and how to react.

There are two main types of scaling policies:

Target Tracking Scaling: This is the most common and easiest to set up. You define a target value for a metric (e.g., "keep average CPU utilization at 60%"). Application Auto Scaling then continuously adjusts the desiredCount to keep the metric at that target.

For example, to scale based on average CPU utilization: You’d create a target tracking scaling policy for your ECS service. The ServiceNamespace would be ecs. The ScalableDimension would be ecs:service:DesiredCount. The ResourceId would be service/my-ecs-cluster/my-web-service. The MetricName would be ECSServiceAverageCPUUtilization. And importantly, TargetValue would be 60.0 (for 60%).

Application Auto Scaling will then monitor ECSServiceAverageCPUUtilization. If it goes above 60%, it will increase desiredCount. If it drops below 60%, it will decrease desiredCount.
Step Scaling: This allows for more granular control. You define specific thresholds and corresponding scaling actions. For instance, "if CPU is above 70%, add 2 tasks; if CPU is above 90%, add 4 tasks."

With step scaling, you’d define alarm actions. For example, an alarm might trigger when ECSServiceAverageCPUUtilization breaches 70% for 5 minutes. The corresponding step scaling policy action would be to ScaleOut by 2 tasks. Another alarm could be set for 90% CPU, triggering a ScaleOut by 4 tasks.

Custom metrics are where things get really powerful. You can use CloudWatch custom metrics to represent application-specific load. For example, if your web service processes requests from a queue, you might want to scale based on the number of messages in that queue.

To do this, your application would need to emit a custom metric to CloudWatch. For instance, a Python application using boto3 might do:

import boto3
import time

cloudwatch = boto3.client('cloudwatch')

def send_queue_depth_metric():
    # This is a placeholder; in reality, you'd fetch this from your queue
    queue_depth = get_current_queue_depth()
    
    response = cloudwatch.put_metric_data(
        Namespace='MyApp/Queue',
        MetricData=[
            {
                'MetricName': 'QueueDepth',
                'Value': queue_depth,
                'Unit': 'Count'
            },
        ]
    )
    print(f"Sent metric: QueueDepth={queue_depth}")

# In a real app, this would be called periodically
# send_queue_depth_metric()

Once this metric is flowing into CloudWatch, you can configure Application Auto Scaling to use it. For a custom metric like QueueDepth in the MyApp/Queue namespace, you would create a target tracking policy with:

MetricName: QueueDepth Namespace: MyApp/Queue Statistic: Average (or Maximum, depending on what makes sense for your queue) TargetValue: 100 (meaning, try to keep the average queue depth at 100 messages)

The ResourceId would still be service/my-ecs-cluster/my-web-service. Application Auto Scaling will then watch this QueueDepth metric and adjust your ECS service’s desiredCount to maintain the target.

One subtle but crucial point is how Application Auto Scaling aggregates metrics across tasks. When you set a TargetValue for ECSServiceAverageCPUUtilization, it’s not looking at the CPU of individual containers. Instead, it calculates the average CPU utilization across all running tasks in your service. If you have 4 tasks, and 3 are at 80% CPU and 1 is at 20%, the average is (80+80+80+20)/4 = 65%. Application Auto Scaling uses this average to decide whether to scale up or down. This is why choosing a target value like 60% or 70% is often a good starting point – it allows some buffer for individual tasks to spike without immediately triggering a scale-out.

The next step after mastering scaling is understanding how to manage cooldown periods and scaling policies in conjunction with other AWS services like Lambda for event-driven scaling.