EventBridge is a serverless event bus that makes it easy to connect applications together using data from your own applications, SaaS applications, and AWS services. When an event is published to EventBridge, it’s delivered to subscriptions. If a subscription fails to deliver an event, EventBridge retries. If retries fail, the event is dropped. To prevent data loss, you can configure EventBridge to send undeliverable events to a Dead Letter Queue (DLQ).
Here’s how to set up an SQS DLQ for EventBridge:
First, create an SQS queue that will act as the DLQ.
aws sqs create-queue --queue-name eventbridge-dlq --attributes '{"VisibilityTimeout": "30"}'
This command creates a standard SQS queue named eventbridge-dlq with a visibility timeout of 30 seconds. The visibility timeout determines how long a message is hidden from other consumers after it’s retrieved.
Next, create an IAM role that EventBridge will assume to send messages to the SQS DLQ.
{
"Version": "2012-07-16",
"Statement": [
{
"Effect": "Allow",
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:us-east-1:123456789012:eventbridge-dlq"
}
]
}
Replace us-east-1 and 123456789012 with your AWS region and account ID, respectively. This policy grants the sqs:SendMessage permission to the specified SQS queue.
Now, create an IAM policy for the role.
{
"Version": "2012-07-16",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
This policy allows the EventBridge service to assume this role.
Attach the IAM policy to the IAM role.
aws iam create-role --role-name EventBridgeDLQRole --assume-role-policy-document file://eventbridge-dlq-trust-policy.json
aws iam attach-role-policy --role-name EventBridgeDLQRole --policy-arn arn:aws:iam::123456789012:policy/EventBridgeDLQPolicy # Replace with your policy ARN
Now, configure your EventBridge rule to send undeliverable events to the DLQ. You can do this via the AWS Management Console or the AWS CLI.
Using the AWS CLI, you’ll update the target of your EventBridge rule. Assuming you have an existing rule named MyEventBridgeRule targeting a Lambda function:
aws events put-targets --rule MyEventBridgeRule --targets "Id": "MyLambdaTarget", "Arn": "arn:aws:lambda:us-east-1:123456789012:function:MyLambdaFunction", "DeadLetterConfig": { "Arn": "arn:aws:sqs:us-east-1:123456789012:eventbridge-dlq" }, "RoleArn": "arn:aws:iam::123456789012:role/EventBridgeDLQRole"
This command associates the DLQ with your EventBridge rule. If the target (in this case, MyLambdaFunction) fails to process an event after EventBridge’s internal retries, the event will be sent to the eventbridge-dlq SQS queue.
This setup ensures that no events are lost. When an event fails to be delivered to its intended target, it lands in the SQS DLQ. From there, you can process these undeliverable events, investigate the root cause of the failure, and potentially reprocess them.
The most surprising thing about EventBridge DLQs is that they don’t automatically retry delivering events from the DLQ. You are responsible for implementing a mechanism to process and potentially re-deliver these events. This often involves creating a separate consumer for the DLQ, such as a Lambda function or an EC2 instance, that can inspect the failed events, take corrective action, and then delete the message from the DLQ or send it to another target.
A common pattern is to have a Lambda function that polls the DLQ. If it successfully processes an event, it deletes the message. If it fails, it can move the message to another queue for further investigation or simply leave it in the DLQ for manual review.
When you implement a DLQ, you’re not just saving events; you’re creating a system for event failure analysis and recovery. The next step is to build a robust strategy for processing and understanding the events that land in your DLQ.