You can automatically trigger a CloudFormation stack rollback when an Amazon CloudWatch alarm enters the ALARM state, preventing a bad deployment from persisting.
Here’s a CloudWatch alarm configured to trigger a rollback:
{
"AlarmName": "MyWebAppCPUAlarm",
"AlarmDescription": "Rollback CloudFormation if CPU utilization exceeds 80% for 5 minutes",
"ActionsEnabled": true,
"AlarmActions": [
"arn:aws:swf:us-east-1:123456789012:action/cloudformation/rollback/MyWebAppStack"
],
"OKActions": [],
"InsufficientDataActions": [],
"MetricName": "CPUUtilization",
"Namespace": "AWS/EC2",
"Statistic": "Average",
"Period": 300,
"EvaluationPeriods": 1,
"Threshold": 80,
"ComparisonOperator": "GreaterThanThreshold",
"Dimensions": [
{
"Name": "AutoScalingGroupName",
"Value": "my-webapp-asg"
}
]
}
This is how it works internally:
- CloudWatch Alarm State Change: When the
CPUUtilizationmetric for themy-webapp-asgAuto Scaling group breaches the 80% threshold for 5 minutes (300 seconds * 1 evaluation period), CloudWatch transitions the alarm to theALARMstate. - SNS Notification (Implicit): While not explicitly shown in the alarm configuration for rollback, CloudWatch alarms typically publish their state changes to an SNS topic. In this scenario, the
AlarmActionsfield directly points to a CloudFormation rollback action ARN, bypassing a separate SNS topic for this specific integration. - CloudFormation Rollback Trigger: CloudFormation listens for these specific state change notifications from CloudWatch. When it receives an
ALARMstate notification targeting a specific stack (identified by the stack ARN in theAlarmActions), it initiates a rollback. - Stack Rollback Execution: CloudFormation reverts the stack to its previous stable state. It does this by undoing all resource changes made during the failed update. For example, if a new EC2 instance was launched, it would be terminated. If a security group rule was added, it would be removed.
The problem this solves is preventing cascading failures or prolonged downtime caused by faulty deployments. Instead of manually detecting an issue, logging into the console, and initiating a rollback, the system automates this detection and remediation. The AlarmActions field is the key here, directly integrating CloudWatch alarms with CloudFormation’s rollback mechanism.
The AlarmActions ARN format is arn:aws:swf:<region>:<account-id>:action/cloudformation/rollback/<stack-name>. You need to replace <region>, <account-id>, and <stack-name> with your actual AWS region, account ID, and the name of the CloudFormation stack you want to protect.
The AlarmActions field can accept multiple ARNs, allowing you to trigger rollbacks for different stacks or even send notifications to SNS topics simultaneously.
When you configure the AlarmActions to point to a CloudFormation rollback ARN, CloudFormation is essentially subscribing to state change events for that specific alarm. It doesn’t need a separate Lambda function or complex automation; the integration is built into the AWS services.
The most surprising thing about this integration is how tightly coupled CloudWatch alarms can be to CloudFormation’s lifecycle without explicit code. You’re not writing a script to monitor the alarm and then call the CloudFormation API; you’re declaring the desired behavior directly within the alarm’s configuration.
The OKActions field is equally important, though often overlooked. If you want to be notified or trigger other actions when the alarm returns to an OK state (indicating the problem has cleared and the deployment is stable), you would configure OKActions similarly. This allows for a complete monitoring and remediation loop.
Without this setup, a bad deployment could go unnoticed for a significant period, impacting users and potentially requiring extensive manual intervention to fix.
The next concept you’ll likely encounter is how to define these alarms and CloudFormation stacks declaratively using Infrastructure as Code tools like AWS CloudFormation itself or Terraform, ensuring this rollback protection is part of your repeatable deployment process.