Managed Cluster Auto Scaling for ECS is an automated way to adjust the number of EC2 instances in your ECS cluster based on the resource demands of your running tasks.
Let’s see it in action with a sample ECS cluster and service.
Imagine you have an ECS cluster named my-ecs-cluster running a web service. This service has a desired count of 2 tasks, each requiring 2 vCPU and 4GiB of memory. Your cluster currently has 2 EC2 instances, each with 4 vCPU and 8GiB of memory.
Here’s how you’d configure Managed Scaling:
First, you need to enable it on the cluster itself.
aws ecs put-cluster-capacity-providers \
--cluster my-ecs-cluster \
--capacity-providers ECS_MANAGED_SPOT_CAPACITY_PROVIDER \
--default-capacity-provider-strategy capacityProvider=ECS_MANAGED_SPOT_CAPACITY_PROVIDER,weight=1,base=1
This command tells ECS to use the ECS_MANAGED_SPOT_CAPACITY_PROVIDER (which is the managed scaling provider) as the default for my-ecs-cluster. The weight and base parameters define how the provider is used. A base of 1 means at least one capacity provider instance will always be running, and a weight of 1 means it’s the primary provider.
Now, let’s attach the managed scaling configuration to your specific service. This is where you define the scaling parameters.
{
"cluster": "my-ecs-cluster",
"serviceName": "my-web-service",
"capacityProviderStrategy": [
{
"capacityProvider": "ECS_MANAGED_SPOT_CAPACITY_PROVIDER",
"weight": 1,
"base": 1
}
],
"managedScaling": {
"status": "ENABLED",
"targetCapacity": 0,
"instanceWarmupPeriod": 300
}
}
You would apply this using the aws ecs update-service command. The managedScaling section is key here. status: ENABLED turns it on. targetCapacity: 0 is a placeholder; it will be calculated automatically. instanceWarmupPeriod: 300 means new instances will take 300 seconds (5 minutes) to be considered ready for scaling decisions.
Now, what happens when your web service gets busy? Let’s say you deploy a new version of your service, or traffic spikes, and the desired count increases to 5 tasks. Each task needs 2 vCPU and 4GiB RAM.
ECS will look at your current cluster’s capacity. You have 2 instances, each with 4 vCPU and 8GiB RAM, totaling 8 vCPU and 16GiB RAM. Your 5 tasks need 10 vCPU and 20GiB RAM. Clearly, you don’t have enough capacity.
Managed Scaling, using the ECS_MANAGED_SPOT_CAPACITY_PROVIDER, will observe this deficit. It will then signal AWS Auto Scaling to launch new EC2 instances into your cluster. It doesn’t just launch any instance; it intelligently selects instance types that are cost-effective (often Spot Instances, hence the provider name) and suitable for your workload’s resource requirements. It will launch instances until the total capacity of your cluster can accommodate the desired state of your ECS service tasks.
The targetCapacity in the managedScaling configuration isn’t something you manually set to a fixed number of instances. Instead, ECS calculates the targetCapacity based on the resource requirements of your running and desired tasks. If your tasks need a total of 20 vCPU and 40GiB of memory, and your current instances provide 10 vCPU and 20GiB, the targetCapacity will be dynamically adjusted to represent the need for an additional 10 vCPU and 20GiB, which translates into launching new instances.
The instanceWarmupPeriod is crucial. When a new EC2 instance is launched by Auto Scaling, it’s not immediately available for ECS to place tasks on. The instanceWarmupPeriod is the time ECS waits after an instance joins the cluster before it counts towards the cluster’s capacity for scaling calculations. This prevents rapid scaling up and down due to temporary resource availability.
The magic of Managed Scaling is that it abstracts away the complexity of configuring separate Auto Scaling Groups, launch templates, and complex scaling policies. You define your service’s resource needs, and ECS handles the underlying infrastructure provisioning and de-provisioning. It continuously monitors the cluster’s capacity against the demands of your tasks and adjusts the EC2 instance count accordingly. This means your applications have the resources they need without you having to manually intervene.
When tasks are stopped or scaled down, Managed Scaling will detect that the cluster has excess capacity and will signal Auto Scaling to terminate idle EC2 instances, saving you costs.
The most surprising aspect of Managed Cluster Auto Scaling is how it uses the resource demands of your tasks as the primary driver for scaling, rather than just the count of tasks. This means if you have tasks that are resource-intensive (e.g., requiring many vCPUs or large amounts of memory), Managed Scaling will provision larger or more instances to meet those specific needs, even if the number of tasks hasn’t changed significantly. It’s a sophisticated, resource-aware scaling mechanism.
The next step is to integrate this with Fargate capacity providers to get the best of both worlds.