EventBridge Pipes aren’t just a black box; you can actually peek inside their performance using CloudWatch Metrics.

Let’s watch a pipe in action. Imagine this pipe: it takes messages from an SQS queue, transforms them with a Lambda function, and then sends the result to a DynamoDB table.

{
  "Source": "arn:aws:sqs:us-east-1:123456789012:MySQSQueue",
  "Target": "arn:aws:dynamodb:us-east-1:123456789012:table/MyDynamoDBTable",
  "Enrichment": {
    "Arn": "arn:aws:lambda:us-east-1:123456789012:function:MyLambdaFunction",
    "InputTemplate": "{\"recordId\": \"<$.event.receiptHandle>\", \"body\": \"<$.event.body>\"}"
  },
  "Name": "SQS-to-DynamoDB-Pipe",
  "CurrentState": "RUNNING",
  "Arn": "arn:aws:pipes:us-east-1:123456789012:pipe/SQS-to-DynamoDB-Pipe",
  "CreationTime": "2023-10-27T10:00:00Z",
  "LastModifiedTime": "2023-10-27T10:05:00Z"
}

When this pipe runs, EventBridge automatically publishes metrics to CloudWatch. You can find these metrics under the AWS/EventBridge namespace. The key thing to understand is that EventBridge Pipes are designed to be observable by default. You don’t need to instrument your Lambda function or DynamoDB table specifically for the pipe’s own performance; EventBridge handles that.

Here are the core metrics you’ll see:

  • Invocations: This is the total number of times the pipe attempted to process an event. If your SQS queue is receiving messages, this metric should be increasing.
  • Succeeded: The number of events successfully processed by the entire pipe, from source to target. This is your primary indicator of a healthy pipe.
  • Failed: The number of events that failed at any stage of the pipe (source retrieval, enrichment, or target delivery). A rising Failed count is your alert.
  • Throttled: The number of events that were throttled by the target service (e.g., DynamoDB write throttling) or the enrichment service (e.g., Lambda concurrency limits). This indicates downstream bottlenecks.
  • RetryAttempts: The number of times EventBridge retried processing an event after an initial failure. This metric is crucial for understanding transient issues.

Let’s look at the Failed metric. If this metric spikes, it means something went wrong. The failure could be in:

  1. The Source: EventBridge couldn’t read from SQS.
  2. The Enrichment: Your Lambda function threw an error or timed out.
  3. The Target: DynamoDB rejected the write.
  4. Serialization/Deserialization: The data format between stages was incorrect.

For each of these, EventBridge provides specific metrics. For example, the Enrichment stage has its own Invocations, Succeeded, and Failed metrics under the AWS/EventBridge namespace, suffixed with .Enrichment. Similarly, Target metrics are available with .Target suffixes.

If Failed is high, you’d first check Failed.Enrichment. If that’s high, dive into the Lambda function’s CloudWatch Logs. If Failed.Target is high, check DynamoDB’s CloudWatch metrics for throttling or errors.

The RetryAttempts metric is also very telling. If you see a high number of retries, it suggests that the failures are often transient. This could be due to temporary network issues, brief service unavailability, or rate limiting. EventBridge Pipes have built-in retry policies, and understanding RetryAttempts helps you tune these or identify underlying systemic problems.

You can set up CloudWatch Alarms on these metrics. For instance, an alarm on Failed greater than 0 for 5 minutes would immediately notify you of a problem. Alarms on Throttled can help you proactively scale your target services.

What most people don’t realize is that EventBridge Pipes expose detailed metrics for each individual stage (source, enrichment, target) as separate metrics within the AWS/EventBridge namespace. This granular visibility, like SQS.Retries or Lambda.Errors (though EventBridge reports these as generic Failed.Enrichment), allows you to pinpoint the exact component causing the pipe to falter without needing to add custom metrics to your Lambda or target.

The next challenge you’ll likely face is efficiently handling the events that do fail, perhaps by sending them to a Dead-Letter Queue.

Want structured learning?

Take the full Eventbridge course →