ECS services can achieve zero-downtime deployments by orchestrating a phased rollout and rollback of new container versions.

Let’s see this in action. Imagine you have a web service running on ECS, version 1.0. You want to deploy version 1.1.

{
  "serviceName": "my-web-service",
  "cluster": "my-cluster",
  "desiredCount": 3,
  "taskDefinition": "my-web-service:1",
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-web-service-tg/abcdef1234567890",
      "containerName": "web-app",
      "containerPort": 80
    }
  ],
  "deploymentConfiguration": {
    "minimumHealthyPercent": 50,
    "maximumPercent": 200
  }
}

This configuration tells ECS to keep at least 50% of tasks running and allows up to 200% of the desiredCount during the deployment. For a desiredCount of 3, this means at least 2 tasks must be healthy, and up to 6 tasks can be running temporarily.

When you update the taskDefinition to my-web-service:2 and trigger a deployment, ECS doesn’t just kill the old tasks and start new ones. Instead, it follows these steps:

  1. Starts new tasks: ECS launches a new task (version 1.1) that registers with the load balancer.
  2. Waits for health checks: The load balancer’s health checks must pass for the new task.
  3. Stops old tasks: Once the new task is healthy, ECS stops one of the old tasks (version 1.0).
  4. Repeats: This cycle of launching a new task, waiting for health, and stopping an old task continues until the desiredCount is met with the new version.

The minimumHealthyPercent and maximumPercent are your primary levers. If minimumHealthyPercent is 100, ECS will wait for a new task to be fully healthy before stopping an old one, guaranteeing no tasks are ever down. If maximumPercent is 100, it will stop an old task before starting a new one, which is not zero-downtime. The default values of 100% and 200% are generally safe for zero-downtime.

ECS also handles rollbacks automatically. If a new task fails its health checks, ECS will stop launching new tasks and instead focus on stopping the unhealthy ones and ensuring the old, healthy tasks remain running. You can also manually trigger a rollback to a previous task definition.

The load balancer is critical. Its health check configuration dictates when ECS considers a new task "ready" to receive traffic. A poorly configured health check (e.g., only checking if the container is running, not if the application inside is responding to requests) can lead to traffic being sent to a non-functional application, even if ECS thinks the deployment is successful. Ensure your load balancer health checks are robust and accurately reflect application health.

The most surprising thing is that ECS doesn’t directly manage the state of the application within the container during a deployment; it relies entirely on the load balancer’s health checks to determine if a new task is ready. This means you can have a task definition that passes ECS’s basic validation but is functionally broken, and ECS will happily try to deploy it if the load balancer health check is too permissive.

The next step in managing deployments is understanding how to use service discovery with ECS services.

Want structured learning?

Take the full Ecs course →