DynamoDB items can’t be larger than 400KB, and the real trick is that this limit isn’t just about storage; it’s about throughput and performance.

Let’s see DynamoDB in action with a typical item that might push this limit. Imagine we’re storing user profiles with extensive activity logs embedded directly.

{
  "userId": "user-12345",
  "profile": {
    "name": "Alice Wonderland",
    "email": "alice@example.com",
    "settings": {
      "theme": "dark",
      "notifications": true
    }
  },
  "activityLog": [
    {"timestamp": 1678886400, "action": "login", "details": "success"},
    {"timestamp": 1678886500, "action": "view_page", "details": "/dashboard"},
    {"timestamp": 1678886600, "action": "update_profile", "details": "changed email"},
    // ... potentially thousands more entries ...
    {"timestamp": 1679000000, "action": "logout", "details": "normal"}
  ],
  "preferences": {
    "language": "en",
    "timezone": "UTC"
  }
}

If the activityLog array grows large enough, this single item could easily exceed 400KB. When that happens, PutItem and UpdateItem operations will fail with a ValidationException (The item size has exceeded the maximum allowed size). More subtly, even if the item fits, very large items can degrade read and write performance because DynamoDB has to read or write the entire 400KB blob for every operation, consuming more read/write capacity units (RCUs/WCUs) than necessary.

The core problem DynamoDB’s 400KB limit solves is ensuring predictable performance and efficient resource utilization. If items could be arbitrarily large, a single massive item could monopolize network bandwidth and disk I/O, making it impossible to guarantee low latency for other operations on the same table. The limit forces developers to think about data modeling and how to break down large logical entities into smaller, manageable pieces.

The most common and often the best solution is document decomposition. Instead of storing a massive activityLog array within the user’s primary item, you break it into separate items.

  1. Separate Table for Log Entries: Create a new table, say UserActivityLog, with a primary key like PK (e.g., USER#user-12345) and SK (e.g., LOG#<timestamp>). Each log entry becomes a separate item in this table.

    • Diagnosis: Use a Scan on your primary table with a filter for userId and check the activityLog size. Or, if you know the item failed, you’ll see the ValidationException.
    • Fix: Write a script to iterate through existing large activityLog arrays, extract each entry, and insert it as a new item into the UserActivityLog table. For new data, PutItem individual log entries to UserActivityLog.
    • Why it works: Each log entry is now a small item (likely < 1KB), well within the 400KB limit. You can efficiently query specific log entries or a range of entries for a user using the new table’s keys.
  2. Using a "Sieve" or "Fan-out" Pattern: If you need to access all activity logs for a user frequently, you can still use decomposition but add a small "index" item.

    • Diagnosis: Same as above.
    • Fix: For each user, create a "metadata" item in the UserActivityLog table (e.g., PK: USER#user-12345, SK: METADATA). This item could contain a list of SKeys for the actual log entries, or simply a count. The actual log entries are stored as PK: USER#user-12345, SK: LOG#<timestamp>. When you need all logs, you first read the METADATA item to get the SKeys, then perform a BatchGetItem on the UserActivityLog table using those SKeys.
    • Why it works: The metadata item is small. The actual log items are small. BatchGetItem is efficient for retrieving many small items.
  3. Attribute Compression (Less Common for this specific problem): For very large single attributes (like a long string or binary blob), you can compress them before storing them in DynamoDB.

    • Diagnosis: You’ll see the ValidationException on PutItem/UpdateItem if the compressed item still exceeds 400KB, or if the uncompressed attribute itself is the culprit.
    • Fix: Use a library like gzip or zlib in your application code to compress the large attribute (e.g., the activityLog JSON string) and store the resulting byte array. You’ll need to decompress it on read.
    • Why it works: Compression reduces the byte size of the data, potentially fitting it within the 400KB limit. However, this doesn’t solve the RCU/WCU consumption problem if you always read the whole compressed blob. It’s more of a last resort for fitting massive single attributes.
  4. S3 Offloading: For truly massive datasets that don’t fit the other patterns, or if you rarely need to access the full data, offload it to Amazon S3.

    • Diagnosis: ValidationException or performance issues.
    • Fix: Store the large data blob (e.g., the JSON string of the activity log) in S3. In your DynamoDB item, store only the S3 object key (e.g., s3://your-bucket/user-logs/user-12345.json).
    • Why it works: S3 has virtually no size limits. DynamoDB items remain small, and you only pay for S3 storage and the small DynamoDB item. You fetch the data from S3 when needed, which is a separate operation.
  5. DynamoDB JSON Document API (for specific use cases): If you are primarily working with nested JSON structures and need to update parts of them without reading the whole item, you can leverage DynamoDB’s JSON Document API. This doesn’t increase the 400KB limit, but it can make working with large JSON documents more efficient.

    • Diagnosis: You’ll hit the 400KB limit.
    • Fix: Use UpdateExpression with path-based updates (e.g., SET profile.settings.theme = :val). This allows you to modify nested attributes. For very large arrays, this still requires careful decomposition as shown above.
    • Why it works: It allows for atomic, partial updates. However, the underlying item still counts towards the 400KB limit and RCU/WCU. This is more about how you update than how much you store.
  6. Multiple Items with a "Page" or "Shard" Key: If you want to keep related data in the same table but avoid the 400KB limit, you can shard the data.

    • Diagnosis: ValidationException.
    • Fix: For userId: user-12345, store activity logs as multiple items: PK: USER#user-12345, SK: LOG_PAGE#1, data: [...]; PK: USER#user-12345, SK: LOG_PAGE#2, data: [...]. Each data attribute would hold a portion of the logs, ensuring each item stays under 400KB.
    • Why it works: You are effectively breaking a single logical collection into multiple DynamoDB items, each managed independently and staying within limits. You’d typically query by PK and a begins_with(SK, 'LOG_PAGE#') condition.

The next challenge you’ll likely encounter is managing the complexity of these decomposed data models and ensuring efficient querying across multiple items or tables.

Want structured learning?

Take the full Dynamodb course →