ECS canary deployments let you roll out new versions of your service to a small subset of users before a full rollout, minimizing the blast radius of potential issues.

Here’s a look at how it works in practice:

{
  "serviceName": "my-app-service",
  "cluster": "my-cluster",
  "desiredCount": 5,
  "taskDefinition": "my-app:3",
  "deploymentConfiguration": {
    "minimumHealthyPercent": 50,
    "maximumPercent": 200
  },
  "deploymentController": {
    "type": "CODE_DEPLOY"
  },
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/abcdef1234567890",
      "containerName": "my-app-container",
      "containerPort": 80
    }
  ],
  "serviceRegistries": [],
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": [
        "subnet-0123456789abcdef0",
        "subnet-0fedcba9876543210"
      ],
      "securityGroups": [
        "sg-0abcdef1234567890"
      ],
      "assignPublicIp": "DISABLED"
    }
  },
  "propagateTags": "SERVICE",
  "enableECSManagedTags": true,
  "enableExecuteCommand": false,
  "tags": []
}

This JSON snippet represents an ECS service configured for a canary deployment using AWS CodeDeploy. Notice deploymentController.type: "CODE_DEPLOY". This is the key to unlocking advanced deployment strategies like canary. When you set this, ECS hands over deployment control to CodeDeploy.

The minimumHealthyPercent and maximumPercent values here are critical. minimumHealthyPercent: 50 means that at least half of your desired task count (so, 2 out of 5 in this case) must remain healthy and running during the deployment. maximumPercent: 200 allows ECS to temporarily spin up twice your desired count (up to 10 tasks) during the deployment process to facilitate the traffic shifting.

When a new task definition revision (e.g., my-app:4) is deployed to this service, CodeDeploy orchestrates the rollout. It will first deploy a small percentage of the new tasks, shifting a small fraction of traffic to them via the specified load balancer target group. ECS then monitors the health of these new tasks. If they pass health checks, CodeDeploy gradually increases the percentage of new tasks and shifts more traffic. If they fail, CodeDeploy automatically rolls back to the previous stable version.

The magic behind this gradual shift isn’t just about spinning up new containers; it’s about how traffic is directed. AWS CodeDeploy, integrated with ECS, leverages the Application Load Balancer (ALB) or Network Load Balancer (NLB) to manage this. When you configure your ECS service with a CODE_DEPLOY controller and associate it with a load balancer, CodeDeploy creates or modifies a CodeDeploy application and deployment group. It then sets up traffic routing rules on your load balancer. Initially, all traffic goes to the "current" or "stable" version of your service (registered with one target group). As the canary deployment progresses, CodeDeploy introduces a new target group for the "beta" or "new" version. It then adjusts the ALB listener rules to split traffic between the stable and beta target groups according to the defined canary percentages.

For instance, a canary deployment might start with 10% of traffic to the new version and 90% to the old. If the new version performs well (passes health checks and doesn’t error out), CodeDeploy might then shift to 25% new, 75% old, and so on, until 100% of traffic is on the new version, and the old version’s tasks are terminated.

What most people don’t realize is that CodeDeploy’s integration with ECS for canary deployments relies on a specific pattern of target group registration and health check monitoring. CodeDeploy uses two target groups associated with your load balancer: one for the current version and one for the new version. It registers a small number of new tasks with the "new" target group and then shifts a portion of traffic to it. Crucially, CodeDeploy monitors both the container health checks (configured in ECS) and specific CodeDeploy lifecycle event hooks (like ApplicationStop or AfterAllowTestTraffic). If any of these fail, CodeDeploy initiates a rollback by shifting traffic back to the "current" target group and terminating the unhealthy new tasks.

Understanding the interplay between ECS task health checks and CodeDeploy lifecycle hooks is key to robust canary deployments.

The next step in mastering deployments is implementing blue/green strategies to achieve zero-downtime rollouts.

Want structured learning?

Take the full Ecs course →