DynamoDB doesn’t actually store data in relational tables; it uses a key-value and document model, meaning your "migration" is more of a re-architecting.
Let’s walk through migrating a simple relational database table, say users, to DynamoDB.
Here’s our hypothetical users table in a relational database:
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(255) UNIQUE,
email VARCHAR(255) UNIQUE,
registration_date DATE,
last_login TIMESTAMP,
profile_data JSON
);
And some sample data:
| user_id | username | registration_date | last_login | profile_data | |
|---|---|---|---|---|---|
| 101 | alice | alice@example.com | 2023-01-15 | 2024-03-10 09:30:00 | {"bio": "Loves dogs"} |
| 102 | bob | bob@example.com | 2023-02-20 | 2024-03-11 14:00:00 | {"bio": "Enjoys cats"} |
| 103 | charlie | charlie@test.com | 2023-03-01 | 2024-03-09 11:00:00 | {"bio": "Likes birds"} |
Designing the DynamoDB Table
In DynamoDB, every table needs a Partition Key and optionally a Sort Key. This combination uniquely identifies an item. We also need to consider Secondary Indexes for querying attributes other than the primary key.
For our users table, a good primary key would be user_id. Since user_id is already the primary key in our relational table, this maps directly. We don’t strictly need a sort key if user_id is sufficient to identify each user.
DynamoDB Table Definition:
- Table Name:
Users - Primary Key:
- Partition Key:
userId(Number)
- Partition Key:
Now, how do we query by username or email? These are unique in our relational table. In DynamoDB, we use Global Secondary Indexes (GSIs) for this.
Global Secondary Index 1 (for username lookup):
- Index Name:
UsernameIndex - Partition Key:
username(String) - Sort Key: (None)
- Projected Attributes:
ALL(This means all attributes of the indexed items will be stored in the index, allowing us to retrieve them directly. For performance and cost, you might project only specific attributes.)
Global Secondary Index 2 (for email lookup):
- Index Name:
EmailIndex - Partition Key:
email(String) - Sort Key: (None)
- Projected Attributes:
ALL
Other Attributes: registrationDate (String, ISO 8601 format), lastLogin (String, ISO 8601 format), profileData (Map).
The Migration Process
-
Create the DynamoDB Table: You can do this via the AWS Console, AWS CLI, or SDKs. Using the AWS CLI:
aws dynamodb create-table \ --table-name Users \ --attribute-definitions \ AttributeName=userId,AttributeType=N \ AttributeName=username,AttributeType=S \ AttributeName=email,AttributeType=S \ --key-schema \ AttributeName=userId,KeyType=HASH \ --global-secondary-indexes \ '[ { "IndexName": "UsernameIndex", "KeySchema": [ {"AttributeName": "username", "KeyType": "HASH"} ], "Projection": { "ProjectionType": "ALL" }, "ProvisionedThroughput": { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 } }, { "IndexName": "EmailIndex", "KeySchema": [ {"AttributeName": "email", "KeyType": "HASH"} ], "Projection": { "ProjectionType": "ALL" }, "ProvisionedThroughput": { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 } } ]' \ --provisioned-throughput \ ReadCapacityUnits=5,WriteCapacityUnits=5 \ --region us-east-1Note:
ProvisionedThroughputvalues are examples. You’d set these based on expected load. -
Export Data from Relational Database: Export your
userstable data into a format like CSV or JSON. For CSV, it might look like this:user_id,username,email,registration_date,last_login,profile_data 101,alice,alice@example.com,2023-01-15,2024-03-10T09:30:00Z,"{""bio"": ""Loves dogs""}" 102,bob,bob@example.com,2023-02-20,2024-03-11T14:00:00Z,"{""bio"": ""Enjoys cats""}" 103,charlie,charlie@test.com,2023-03-01,2024-03-09T11:00:00Z,"{""bio"": ""Likes birds""}"Notice the JSON string is escaped for CSV.
-
Import Data into DynamoDB: The most efficient way to do this for large datasets is using the AWS Data Pipeline or AWS Glue. For smaller datasets, you can use a custom script with the AWS SDK. Here’s a conceptual Python example using
boto3:import boto3 import csv import json dynamodb = boto3.resource('dynamodb', region_name='us-east-1') table = dynamodb.Table('Users') with open('users.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: # Convert types and format dates item = { 'userId': int(row['user_id']), 'username': row['username'], 'email': row['email'], 'registrationDate': row['registration_date'], # Store as string, query with date functions if needed 'lastLogin': row['last_login'], # Store as string 'profileData': json.loads(row['profile_data']) # Parse JSON string into a DynamoDB Map } try: response = table.put_item(Item=item) print(f"Put item succeeded for userId: {row['user_id']}") except Exception as e: print(f"Error putting item for userId {row['user_id']}: {e}")This script reads the CSV, transforms the data (converting
user_idto an integer, parsingprofile_dataJSON), and then usesput_itemto insert each user into DynamoDB.
Querying in DynamoDB
-
Get user by
userId(Primary Key):response = table.get_item(Key={'userId': 101}) item = response.get('Item') print(item) -
Get user by
username(using GSIUsernameIndex):response = table.query( IndexName='UsernameIndex', KeyConditionExpression=boto3.dynamodb.conditions.Key('username').eq('alice') ) items = response.get('Items') print(items) -
Get user by
email(using GSIEmailIndex):response = table.query( IndexName='EmailIndex', KeyConditionExpression=boto3.dynamodb.conditions.Key('email').eq('bob@example.com') ) items = response.get('Items') print(items)
Key Differences and Considerations
- Schema: DynamoDB is schema-less (or schema-on-read), but you define attribute types at the table/index level. Items in the same table can have different attributes.
- Relationships: DynamoDB is not designed for complex relational joins. You’ll often denormalize data or use multiple queries to reconstruct related information.
- Cost: DynamoDB is priced based on provisioned or on-demand throughput (reads/writes) and storage. GSIs incur additional costs for storage and throughput.
- Data Types: DynamoDB has specific data types (String, Number, Binary, Boolean, Null, List, Map, Set). You need to map your relational types accordingly. JSON from your relational DB becomes a
Mapin DynamoDB. Dates and timestamps are often stored as ISO 8601 strings. - Scalability: DynamoDB scales horizontally. You manage capacity (provisioned or on-demand). Relational databases typically scale vertically or with more complex sharding strategies.
The most overlooked aspect of relational-to-NoSQL migration is how to model relationships. Instead of joins, you’ll often embed related data or use application-level joins that involve multiple DynamoDB lookups, which requires careful data modeling to avoid performance bottlenecks and excessive costs.
After migrating your primary table and indexes, you’ll likely need to consider how to handle foreign key relationships and implement them using DynamoDB’s patterns.