CloudFormation stacks, when updated in production, are not atomic operations; a single resource failure can leave your stack in an inconsistent, partially updated state.
Let’s see a simple stack update in action.
Imagine you have a basic EC2 instance managed by CloudFormation.
Resources:
MyEC2Instance:
Type: AWS::EC2::Instance
Properties:
ImageId: ami-0abcdef1234567890 # Example AMI ID
InstanceType: t2.micro
Tags:
- Key: Name
Value: MyProductionInstance
Now, you decide to change the InstanceType to t3.micro.
Resources:
MyEC2Instance:
Type: AWS::EC2::Instance
Properties:
ImageId: ami-0abcdef1234567890 # Example AMI ID
InstanceType: t3.micro # Changed from t2.micro
Tags:
- Key: Name
Value: MyProductionInstance
When you execute aws cloudformation update-stack --stack-name my-production-stack --template-body file://updated-template.yaml, CloudFormation doesn’t just flip a switch. It initiates a change set, which details the proposed modifications. You then review and execute this change set. Under the hood, CloudFormation attempts to replace the existing MyEC2Instance resource with a new one matching the t3.micro specification.
The core problem CloudFormation solves is state management for your infrastructure. Instead of manually provisioning and configuring resources, you declare their desired state, and CloudFormation makes it so. When you update a stack, you’re telling CloudFormation to reconcile the current state with the new desired state.
The mental model for updates revolves around CloudFormation creating a new resource, migrating any necessary data or associations (if it’s a replacement operation), and then deleting the old resource. For some resources, like EC2 instances, this is a direct replacement. For others, like RDS instances or CloudFront distributions, the update strategy is more complex and often involves creating a new resource alongside the old one before switching over.
The update process itself is managed by CloudFormation’s Change Sets. A Change Set is a preview of the changes CloudFormation will make to your stack. This is crucial for production environments because it allows you to review exactly what will be modified before any actual changes are applied. You can generate a change set using aws cloudformation create-change-set --stack-name my-production-stack --template-body file://updated-template.yaml --change-set-name my-update-changeset and then review it in the AWS console or via aws cloudformation describe-change-set --change-set-name my-update-changeset --stack-name my-production-stack. Once you’re satisfied, you execute the change set with aws cloudformation execute-change-set --change-set-name my-update-changeset --stack-name my-production-stack.
For critical production updates, the most powerful tool is Rollback Configuration. You can specify actions CloudFormation should take if an update fails. This includes rolling back to the previous successful stack state. You configure this when creating or updating the stack:
aws cloudformation create-stack \
--stack-name my-production-stack \
--template-body file://my-template.yaml \
--rollback-configuration \
"RollbackOnFailure": true, \
"RollbackStackOnFailure": true
Alternatively, you can set this on an existing stack:
aws cloudformation update-stack \
--stack-name my-production-stack \
--template-body file://my-template.yaml \
--rollback-configuration \
"RollbackOnFailure": true, \
"RollbackStackOnFailure": true
This ensures that if any resource fails to create or update, CloudFormation automatically attempts to revert all changes made during that specific update operation, returning your stack to its prior stable state. This is the primary safety net for production deployments.
The one thing most people don’t realize about RollbackOnFailure is that it’s not instantaneous. When an update fails, CloudFormation marks the stack as UPDATE_ROLLBACK_IN_PROGRESS. It then systematically attempts to undo each change it made in the failed update, in reverse order. If a rollback itself fails, the stack can get stuck in a UPDATE_ROLLBACK_FAILED state, which requires manual intervention. Therefore, understanding the dependencies between your resources is vital, as a failed rollback on a dependent resource can cascade.
The next logical step after mastering updates is understanding how to manage drift detection to ensure your stack’s actual state matches its declared state.