DynamoDB tables are surprisingly bad at being databases, but amazing at being data stores.

Let’s say you’ve got a DynamoDB table, my-app-data, with a few million items. You need to run some complex SQL-like queries on it for an analytics report, or maybe you just need to archive it for compliance. Copying data out item by item via Scan or Query is painful, slow, and expensive. You’d be hitting the provisioned throughput limits constantly, and the sheer volume of API calls would be a nightmare.

This is where DynamoDB’s native export to S3 comes in. It’s not just a Scan operation with a different destination; it’s a fundamentally different way of getting data out, designed for bulk operations.

Here’s my-app-data before we export:

// Example DynamoDB item in my-app-data
{
  "userId": "user123",
  "orderTimestamp": 1678886400,
  "orderID": "ORD-98765",
  "items": [
    {"productId": "PROD-A", "quantity": 2},
    {"productId": "PROD-B", "quantity": 1}
  ],
  "totalAmount": 75.50
}

To export this table, you initiate an export task via the AWS CLI or the AWS Management Console. You specify the source table and the destination S3 bucket.

aws dynamodb export-table-to-point-in-time \
    --table-arn arn:aws:dynamodb:us-east-1:123456789012:table/my-app-data \
    --s3-bucket-name my-dynamodb-exports \
    --s3-prefix my-app-data-export/2023-03-15 \
    --export-format DYNAMODB_JSON \
    --time-range-lower-bound Timestamp=1678886400 # Optional: export data as of this timestamp

This command doesn’t immediately copy data. Instead, it triggers an asynchronous export job. DynamoDB handles the heavy lifting, reading your table’s data in parallel and writing it directly to S3. The data lands in your S3 bucket, organized into files.

The DYNAMODB_JSON format means each item is written as a separate JSON object on its own line within a file, which is generally the most straightforward for analysis.

// Example line in an S3 export file (DYNAMODB_JSON format)
{"userId": {"S": "user123"}, "orderTimestamp": {"N": "1678886400"}, "orderID": {"S": "ORD-98765"}, "items": {"L": [{"M": {"productId": {"S": "PROD-A"}, "quantity": {"N": "2"}}}, {"M": {"productId": {"S": "PROD-B"}, "quantity": {"N": "1"}}}]}, "totalAmount": {"N": "75.50"}}

Notice how the DynamoDB data types (S, N, L, M) are preserved. This is crucial for accurate interpretation. If you were exporting to CSV, you’d lose this type information.

Alternatively, you can choose the ION format, which is a more efficient, binary format also developed by Amazon. It’s often preferred for large-scale data processing, especially with services like Amazon Athena, as it can be more performant and cost-effective.

aws dynamodb export-table-to-point-in-time \
    --table-arn arn:aws:dynamodb:us-east-1:123456789012:table/my-app-data \
    --s3-bucket-name my-dynamodb-exports \
    --s3-prefix my-app-data-export/2023-03-15 \
    --export-format ION

When the export is complete, you’ll receive an S3 event notification (if configured) or can check the export task status. The data in S3 is typically sharded into multiple files, each around 1GB in size, to facilitate parallel processing by downstream analytics tools.

Once the data is in S3, you can query it directly using Amazon Athena. Athena is a serverless query service that makes it easy to analyze data directly in S3 using standard SQL.

First, you define an external table in Athena that points to your S3 data.

-- Athena DDL to define the external table
CREATE EXTERNAL TABLE IF NOT EXISTS my_app_data_analytics (
  userId string,
  orderTimestamp bigint,
  orderID string,
  items array<struct<productId:string, quantity:int>>,
  totalAmount double
)
PARTITIONED BY (export_date string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' -- Use 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' for ION format
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://my-dynamodb-exports/my-app-data-export/';

If you exported using DYNAMODB_JSON, you’ll need a JSON SerDe. For ION, you’d use the corresponding Parquet SerDe. After defining the table, you can run SQL queries.

-- Example Athena query
SELECT
  DATE(FROM_UNIXTIME(orderTimestamp)) as order_date,
  COUNT(DISTINCT userId) as distinct_users,
  SUM(totalAmount) as total_revenue
FROM my_app_data_analytics
WHERE export_date = '2023-03-15'
GROUP BY DATE(FROM_UNIXTIME(orderTimestamp))
ORDER BY order_date DESC
LIMIT 10;

This entire process bypasses the need for custom export scripts, provisioned throughput concerns during export, and complex data transformation pipelines. It leverages DynamoDB’s internal, highly optimized data access mechanisms to efficiently dump the entire table’s state to S3.

The trickiest part for most users is understanding how to properly map the DynamoDB JSON structure, with its type descriptors (S, N, BOOL, L, M, etc.), into a queryable schema in Athena. You have to explicitly define these types in your CREATE EXTERNAL TABLE statement. For instance, the DynamoDB {"N": "75.50"} becomes an Athena double. A DynamoDB {"L": [...]} becomes an Athena array<>. A {"M": {...}} becomes a struct<>. Getting this mapping wrong is the most common reason queries in Athena fail or return incorrect results.

The next step is to consider how to automate this export process for regular backups or ongoing analytics feeds.

Want structured learning?

Take the full Dynamodb course →