EventBridge is silently dropping events larger than 256KB, and you probably won’t know until downstream services start failing.
The primary issue is that EventBridge has a hard limit of 256KB for the entire event payload, including metadata and the event body itself. When an event exceeds this, EventBridge simply discards it without any explicit error notification to the source or a dead-letter queue (unless specifically configured for failed events, which this isn’t). This means your application might appear to be sending events successfully, but they’re vanishing into the ether.
Here are the common culprits and how to fix them:
1. Large Event Body
Diagnosis: The most straightforward cause is a single event payload that’s too big. This often happens when you’re sending large documents, images (base64 encoded), or extensive JSON structures.
Check: You can’t directly inspect EventBridge’s internal handling, but you can monitor the size of events before they are sent. Use your application’s logging to log the Content-Length of the HTTP request body being sent to PutEvents or the size of the event object before serialization.
# Example using AWS CLI to simulate a large event (won't actually send, but shows size)
# This is a conceptual check; implement this in your application code.
EVENT_BODY='{"large_data": "'$(head -c 200000 /dev/urandom | base64)'"}' # ~200KB base64 data
EVENT_PAYLOAD='{"Entries": [{"Source": "my.app", "DetailType": "MyLargeEvent", "Detail": '"$EVENT_BODY"'}]}'
echo "$EVENT_PAYLOAD" | wc -c # Check byte count
Fix: The most robust solution is to break large events into smaller chunks. Alternatively, store the large data in S3 and send a reference (S3 object key and bucket name) in the EventBridge event. Downstream consumers can then retrieve the data from S3.
Why it works: By reducing the size of the individual event payload, you ensure it stays within EventBridge’s 256KB limit. Sending an S3 reference is a common pattern for large objects, as S3 is designed for bulk storage.
2. Excessive Metadata
Diagnosis: While less common than a large event body, the EventBridge metadata itself can contribute to the size. Each event has fields like Source, DetailType, EventBusName, Time, Resources, and TraceHeader. If you’re adding a lot of custom attributes to the event before sending it to EventBridge (e.g., within the Detail object that you then stringify), these add up.
Check: Again, log the size of the entire event object (including all its fields) in your application before sending it to EventBridge.
Fix: Audit the Detail object for any unnecessary or redundant fields. If you’re embedding large amounts of context that aren’t strictly necessary for the event’s immediate processing, consider externalizing it or trimming it down.
Why it works: Reducing the size of the Detail field directly decreases the total event size, keeping it under the 256KB limit.
3. Base64 Encoding Overhead
Diagnosis: When you send binary data (like images or serialized protobufs) within a JSON event, you typically base64 encode it. Base64 encoding increases the data size by approximately 33%. A 200KB binary payload becomes roughly 266KB when base64 encoded, pushing it over the limit.
Check: Monitor the size of your encoded payloads. If you’re consistently seeing payloads that are around 180-190KB of binary data, the base64 encoding is likely the culprit.
Fix: Similar to large event bodies, the best solution is to store binary data in S3 and send a reference. If you absolutely must send binary data directly and it’s close to the limit, consider if you can compress it before base64 encoding, though this adds complexity.
Why it works: S3 is the intended service for large binary objects. Compression can reduce the encoded size, but it’s a more involved solution.
4. Repeated Event Bus Submissions
Diagnosis: If your application logic has a retry mechanism that resends the exact same large event multiple times, and the initial attempt was dropped, the retries will also be dropped. This can lead to a false sense of a widespread problem when it’s just repeated failures of an already-too-large event.
Check: Implement idempotency keys or sequence numbers in your events and track them in your application logs. If you see the same event being sent repeatedly without success, it’s likely too large.
Fix: Ensure your event generation logic is idempotent or that retries are handled with exponential backoff and jitter, but crucially, after ensuring the event size is manageable. The primary fix here is addressing the size issue first.
Why it works: Idempotency prevents duplicate processing. By fixing the underlying size issue, retries become successful.
5. Event Size in API Gateway/Lambda Proxies
Diagnosis: If you’re using API Gateway or a Lambda proxy to ingest events and then forward them to EventBridge, their own payload size limits can be a factor. API Gateway has a default 10MB limit, but Lambda has a 6MB synchronous invocation payload limit. If the event is large before it even hits your Lambda function, it might be truncated or cause issues there.
Check: Log the request body size within your Lambda function immediately after it’s invoked.
Fix: If the event is large enough to cause issues in API Gateway or Lambda, you’ll need to address it at the source. The most common fix is to use S3 for large payloads, as described earlier.
Why it works: It ensures the event is manageable before it even reaches the EventBridge ingestion point.
6. EventBridge Schema Registry Large Schemas
Diagnosis: While not directly related to event payload size, if you’re using the EventBridge Schema Registry and your schema definitions are exceptionally large, this can indirectly cause problems. The schema registry itself has limits, and overly complex schemas might lead to performance issues or unexpected behavior when EventBridge attempts to match or validate events against them.
Check: Examine the size and complexity of your .json schema files in the Schema Registry.
Fix: Refactor your schemas to be more concise. Break down complex event structures into smaller, more manageable schemas if possible.
Why it works: A well-defined, reasonably sized schema improves the efficiency of EventBridge’s internal schema processing and matching.
The next error you’ll encounter after fixing event size issues is likely related to downstream service capacity or latency, as your now-successfully-delivered events start hitting services that might not have been prepared for their volume or processing demands.