An EC2 Auto Scaling group can gracefully terminate instances by draining in-flight requests, preventing data loss or broken user sessions.

Let’s see this in action with a simple web application behind an Elastic Load Balancer (ELB).

Imagine this scenario: Your application runs on EC2 instances managed by an Auto Scaling group. You’ve configured a scale-in event, meaning the Auto Scaling group is about to terminate one of your instances. Without any special handling, that instance might be in the middle of processing a user’s request – perhaps a payment transaction or a file upload. When the instance is terminated abruptly, that request fails.

This is where EC2 Auto Scaling Lifecycle Hooks come in. They allow you to pause the termination process for an instance, giving you time to perform specific actions before the instance is actually shut down. For draining in-flight requests, the most common action is to tell the ELB to stop sending new traffic to the instance and wait for existing connections to complete.

Here’s how it works:

  1. Lifecycle Hook Creation: You define a lifecycle hook on your Auto Scaling group. This hook specifies an action to take when an instance is entering a specific state, like Terminating:Wait. You’ll also define a NotificationTarget (like an SQS queue or SNS topic) and potentially a DefaultResult (like CONTINUE or ABORT).

  2. Instance Termination Trigger: When the Auto Scaling group decides to terminate an instance (due to scale-in, health check failure, etc.), it first sends a notification to your NotificationTarget. The instance state changes to Terminating:Wait, and it will remain in this state until you explicitly tell Auto Scaling to proceed or abandon the termination.

  3. Draining Logic: Your application (or a separate service) receives the notification. It then needs to perform the draining. This typically involves:

    • De-registering from ELB: Telling the ELB to stop routing new requests to this specific instance.
    • Waiting for Connections: Allowing existing in-flight requests to finish processing. This might be a fixed timeout, or more sophisticated logic that checks for active connections.
  4. Completing the Hook: Once the draining is complete, you send a CompleteLifecycleAction API call back to Auto Scaling, specifying the instance ID, the lifecycle hook name, and an action (CONTINUE to terminate, ABORT to keep it running).

Let’s look at a practical configuration example.

Suppose you have an Auto Scaling group named my-web-asg. You want to drain connections before terminating.

First, create an SQS queue to receive notifications:

aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-web-asg --launch-configuration-name my-lc --min-size 1 --max-size 5 --desired-capacity 2

Then, create a lifecycle hook that targets this SQS queue for the Terminating:Wait state:

aws autoscaling put-lifecycle-hook \
    --lifecycle-hook-name TerminateWithDrainHook \
    --auto-scaling-group-name my-web-asg \
    --lifecycle-transition-type "autoscaling:EC2_INSTANCE_TERMINATING" \
    --heartbeat-timeout 300 \
    --default-result CONTINUE \
    --notification-target-resource-arn "arn:aws:sqs:us-east-1:123456789012:my-asg-notifications-queue" \
    --notification-metadata "Drain logic for my-web-asg"
  • --lifecycle-hook-name: A descriptive name for your hook.
  • --auto-scaling-group-name: The ASG this hook applies to.
  • --lifecycle-transition-type: Specifies when this hook triggers. autoscaling:EC2_INSTANCE_TERMINATING means it fires when an instance is about to be terminated.
  • --heartbeat-timeout: How long Auto Scaling will wait for a CompleteLifecycleAction call before assuming the instance is stuck and continuing the termination (defaulting to CONTINUE). 300 seconds (5 minutes) is a common starting point.
  • --default-result: What happens if the heartbeat-timeout is reached. CONTINUE means terminate the instance. ABORT means keep it running.
  • --notification-target-resource-arn: The ARN of the SQS queue or SNS topic that will receive the notification.
  • --notification-metadata: Arbitrary data you can pass to your notification handler.

Now, when an instance in my-web-asg is scheduled for termination, the ASG will send a message to my-asg-notifications-queue. This message will contain details about the instance and the lifecycle action.

Your SQS queue consumer (e.g., a Lambda function, a worker EC2 instance) polls this queue. Upon receiving a message, it reads the LifecycleActionToken and the InstanceId.

To de-register the instance from the ELB, you’ll use the ELB API. Assuming you’re using an Application Load Balancer (ALB) or Network Load Balancer (NLB):

aws elbv2 deregister-targets \
    --target-group-arn "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-target-group/1234567890123456" \
    --targets Id=<instance-id>,Port=80

Replace <instance-id> with the actual instance ID from the SQS message. You’ll also need the correct target-group-arn and Port.

After de-registering, you wait for the existing connections to drain. A common pattern is to simply sleep for a period, or to query the ELB for active connections (though this can be complex). A 30-60 second wait is often sufficient for many web applications.

Once you’ve waited, you signal Auto Scaling to proceed:

aws autoscaling complete-lifecycle-action \
    --lifecycle-hook-name TerminateWithDrainHook \
    --auto-scaling-group-name my-web-asg \
    --lifecycle-action-token "<lifecycle-action-token>" \
    --instance-id "<instance-id>" \
    --lifecycle-action-result CONTINUE

Again, replace <lifecycle-action-token> and <instance-id> with values from the SQS message. CONTINUE tells Auto Scaling to proceed with termination.

The truly clever part is that the heartbeat-timeout on the lifecycle hook isn’t just a passive timer. Your draining process should periodically send a RecordLifecycleActionHeartbeat API call. This tells Auto Scaling, "Hey, I’m still working on this!" and resets the timeout. If your draining logic takes longer than heartbeat-timeout and you don’t send a heartbeat, the instance will be terminated anyway, but it’s a fallback.

The most common pitfall is forgetting to deregister the instance from the ELB before waiting. If you just wait, the ELB will continue sending new requests to the instance until it’s gone, defeating the purpose. Another common issue is setting the heartbeat-timeout too short for your draining process, or not having a robust mechanism to complete the lifecycle action after draining.

Once you’ve successfully drained requests, the next challenge you’ll encounter is managing the state of your application across instances during rolling updates or deployments.

Want structured learning?

Take the full Ec2 course →