DynamoDB Streams can capture every single write operation as an immutable record, but it’s not actually a stream of data in the way you might think.

Let’s see it in action. Imagine we have a DynamoDB table Orders with a order_id (string) and status (string).

// Sample DynamoDB Item
{
  "order_id": "ORD12345",
  "status": "PENDING"
}

When we update this item to status: "PROCESSING", DynamoDB Streams captures this as an event. Here’s a simplified view of what that event might look like in Kinesis:

{
  "Records": [
    {
      "eventID": "1",
      "eventName": "MODIFY",
      "eventVersion": "1.1",
      "eventSource": "aws:dynamodb",
      "awsRegion": "us-east-1",
      "dynamodb": {
        "Keys": {
          "order_id": { "S": "ORD12345" }
        },
        "NewImage": {
          "order_id": { "S": "ORD12345" },
          "status": { "S": "PROCESSING" }
        },
        "OldImage": {
          "order_id": { "S": "ORD12345" },
          "status": { "S": "PENDING" }
        },
        "SequenceNumber": "1234567890",
        "SizeBytes": 100,
        "StreamViewType": "NEW_AND_OLD_IMAGES"
      },
      "eventSourceARN": "arn:aws:dynamodb:us-east-1:123456789012:table/Orders/stream/2023-01-01T00:00:00.000"
    }
  ]
}

This event, appearing in a Kinesis Data Stream, is the foundation for real-time processing. You can then have AWS Lambda functions, Kinesis Data Analytics, or Kinesis Data Firehose consume these records and react to changes in your DynamoDB table. This is invaluable for scenarios like updating search indexes, triggering notifications, or performing complex analytics as data is modified.

The core problem this solves is bridging the gap between a highly performant, NoSQL data store and real-time event-driven architectures. Traditionally, you’d poll DynamoDB for changes, which is inefficient and introduces latency. DynamoDB Streams, by emitting events to Kinesis, allows downstream systems to be notified immediately when data changes, enabling near real-time analytics and application updates.

Internally, when you enable Streams on a DynamoDB table, DynamoDB writes a copy of each item-level modification (INSERT, UPDATE, DELETE) to a dedicated stream. This stream is then exposed as a Kinesis Data Stream (or a legacy DynamoDB Stream). You choose the StreamViewType when enabling streams: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, or NEW_AND_OLD_IMAGES. NEW_AND_OLD_IMAGES provides the most context, showing you the state of the item before and after the modification, which is crucial for understanding the change itself.

The eventSourceARN tells you precisely which DynamoDB table and stream generated the event. The dynamodb object contains the actual data: Keys to identify the item, NewImage for the state after the write, and OldImage for the state before. The eventName (INSERT, MODIFY, REMOVE) tells you what happened.

The capacity of your DynamoDB table and the throughput of your Kinesis stream are critical levers. If your DynamoDB writes exceed its provisioned throughput, you’ll see throttling errors in DynamoDB. If your Kinesis stream can’t keep up with the rate of events from DynamoDB Streams, your consumers will fall behind, leading to increased processing latency. You need to ensure both are scaled appropriately for your workload. For example, if your DynamoDB table has a write capacity of 1000 WCU and your stream view type is NEW_AND_OLD_IMAGES, a single update operation might consume 2 WCU (1 for the write to the table, and an additional 1 for the stream write).

When you configure DynamoDB Streams, you’re not just enabling a feature; you’re defining the granularity of change capture. The StreamViewType is the most significant configuration. If you only need to know which item changed, KEYS_ONLY is efficient. If you need to reconstruct the state or understand the value of the change, NEW_IMAGE or OLD_IMAGE are necessary. For comprehensive auditing and complex downstream logic, NEW_AND_OLD_IMAGES is the most powerful, but it also generates larger records and consumes more stream capacity. The choice directly impacts the cost and complexity of your stream processing.

The actual data in the NewImage and OldImage fields is in a DynamoDB JSON format, which often needs to be unmarshaled by your consumer (like Lambda) into a standard JSON object for easier manipulation.

Want structured learning?

Take the full Dynamodb course →