EventBridge is silently dropping events on you if you’re not careful, and the default behavior when hitting limits is a randomized exponential backoff that can make your system feel like it’s just… not working.

Let’s see this in action. Imagine you have a Lambda function triggered by an EventBridge rule. You’re sending events to a custom event bus at a high rate.

// Event Payload Example
{
  "Source": "com.mycompany.orders",
  "DetailType": "OrderCreated",
  "Detail": {
    "orderId": "12345",
    "customer": "Alice",
    "amount": 99.99
  }
}

By default, EventBridge has API rate limits. For PutEvents to a custom event bus, it’s 300 requests per second per AWS account. If you exceed this, EventBridge starts throttling. It doesn’t just fail; it retries, but with increasing delays.

Here’s how you manage it:

1. Understand the Limits

The primary limit for PutEvents to a custom event bus is 300 requests per second (RPS) per AWS account. There’s also a limit on the total payload size per request (256KB) and per second (256KB). You can request limit increases via AWS Support, but that’s a last resort.

2. Implement Client-Side Retries with Backoff

Your application sending events needs to handle throttling gracefully. The AWS SDKs have built-in retry logic, but you should configure it.

  • Diagnosis: Check your application logs for ThrottlingException or ProvisionedThroughputExceededException.

  • Fix: Configure the AWS SDK’s retry strategy. For example, in Python with boto3, you can set retries:

    import boto3
    from botocore.config import Config
    
    client = boto3.client(
        'events',
        region_name='us-east-1',
        config=Config(
            retries={
                'max_attempts': 10,
                'mode': 'standard'
            }
        )
    )
    
    # ... later in your code when calling put_events
    try:
        response = client.put_events(...)
    except Exception as e:
        # SDK's built-in retries will handle some of this,
        # but log any persistent failures.
        print(f"EventBridge put_events failed: {e}")
    
  • Why it works: The standard retry mode implements a randomized exponential backoff, meaning it will retry a few times with increasing delays (e.g., 1s, 2s, 4s, 8s…) before giving up. This gives EventBridge breathing room to recover.

3. Monitor EventBridge Metrics

AWS provides metrics for EventBridge that are crucial for understanding your traffic patterns and identifying throttling.

  • Diagnosis: Navigate to CloudWatch -> Metrics -> All Metrics. Search for "EventBridge". Look for PutEventSuccess and PutEventFailure for your custom bus. Pay close attention to PutEventFailure with ErrorCode ThrottlingException.
  • Fix: Set up CloudWatch Alarms on PutEventFailure count. If the failure rate exceeds a threshold (e.g., > 5 failures per minute), trigger an alert. This alarm should prompt you to investigate the source of the high traffic.
  • Why it works: These metrics give you a direct view into what EventBridge is experiencing, allowing you to detect throttling before it causes widespread issues.

4. Batch Events (Carefully)

The PutEvents API allows you to send up to 10 events in a single request. This can significantly reduce the number of API calls.

  • Diagnosis: If your PutEventFailure metrics show throttling and your individual events are small, you might not be hitting the request limit, but rather the payload limit if you’re sending many events individually.

  • Fix: Instead of calling put_events for each event, collect events and send them in batches:

    events_to_send = []
    for i in range(5): # Collect up to 5 events
        events_to_send.append({
            'Source': 'com.mycompany.orders',
            'DetailType': 'OrderCreated',
            'Detail': json.dumps({"orderId": f"batch_{i}", "customer": "Bob", "amount": 10.0})
        })
        if len(events_to_send) == 10: # Max batch size
            client.put_events(Entries=events_to_send)
            events_to_send = []
    if events_to_send: # Send any remaining events
        client.put_events(Entries=events_to_send)
    
  • Why it works: Each PutEvents call counts as one request. Batching 10 events into one call reduces your RPS by a factor of 10, making it much easier to stay under the 300 RPS limit. Be mindful of the total payload size per request (256KB).

5. Implement Dead-Letter Queues (DLQs)

When retries fail or you want to capture events that couldn’t be processed, a DLQ is essential. For targets like Lambda or SQS, you can configure a DLQ directly. For PutEvents itself, you’d typically implement this in the consumer of the events, but understanding it helps.

  • Diagnosis: You see persistent PutEventFailure metrics and your downstream systems aren’t receiving all events, even after your application’s retries.
  • Fix: For a Lambda target, configure a DLQ in the Lambda function’s configuration:
    • Go to your Lambda function -> Configuration -> Asynchronous invocation.
    • Set "On-failure destination" to "Amazon SQS dead-letter queue".
    • Choose or create an SQS queue.
  • Why it works: If EventBridge successfully delivers an event to a target (like Lambda) but the target fails to process it after its own retries, the event is sent to the DLQ. This doesn’t directly solve PutEvents throttling but is critical for overall event processing reliability. For PutEvents throttling at the source, you need to fix the source’s sending rate.

6. Adjust Sending Rate

If your application genuinely needs to send events at a rate higher than EventBridge’s default limits, you need to control the source.

  • Diagnosis: After implementing client-side retries, monitoring, and batching, you are still seeing throttling exceptions.

  • Fix: Implement rate limiting in your sending application. Use libraries like token-bucket or a simple semaphore to ensure you don’t exceed a sustainable rate (e.g., 250 RPS to leave some buffer).

    from py_expressionengine.rate_limiter import RateLimiter
    import time
    
    # Allow 250 events per second
    rate_limiter = RateLimiter(max_rate=250, period=1.0)
    
    def send_event_safely(event_data):
        with rate_limiter:
            # Your boto3 client.put_events call here
            # This will block if the rate limit is exceeded
            client.put_events(Entries=[event_data])
            print("Event sent.")
    
    # In your loop:
    # send_event_safely(single_event_payload)
    
  • Why it works: This proactively prevents you from hitting the API limit by controlling how fast your application can make the put_events calls, rather than relying on EventBridge to reject and retry.

7. Request Limit Increases (Rarely)

If your traffic patterns are legitimate and consistently exceed the default limits, you can request an increase.

  • Diagnosis: You’ve exhausted all other options, and your business needs genuinely require a higher throughput than the default 300 RPS.
  • Fix: Open a support case with AWS Support. Clearly state your use case, current throughput, and the desired increased throughput. Be prepared to justify the request with data.
  • Why it works: AWS can provision additional capacity for your account, but this is a manual process and not guaranteed.

The next error you’ll likely encounter if you fix throttling is a downstream resource failing due to high volume, such as a Lambda function hitting its concurrency limits.

Want structured learning?

Take the full Eventbridge course →