DynamoDB hot keys are a myth; they’re a symptom of a single partition bearing the brunt of your workload.
Let’s watch this happen. Imagine a users table with a userId as the partition key. Every write to user:alice, user:bob, and user:charlie hits the same partition if they all happen to hash to the same partition ID. If user:alice suddenly gets a million writes per second, that partition hits its 3000 RCU/1000 WCU limit and throttling ensues, even if other partitions are sitting idle.
The solution is write sharding, and it’s not about adding more partitions. It’s about making your partition key smarter. We’ll introduce a shardId to the partition key, making it userId#shardId.
Here’s a sample table definition:
{
"TableName": "users",
"KeySchema": [
{
"AttributeName": "pk",
"KeyType": "HASH"
},
{
"AttributeName": "sk",
"KeyType": "RANGE"
}
],
"AttributeDefinitions": [
{
"AttributeName": "pk",
"AttributeType": "S"
},
{
"AttributeName": "sk",
"AttributeType": "S"
}
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 10,
"WriteCapacityUnits": 10
}
}
Our pk will be userId#shardId, and sk will be something else, like profile.
To implement write sharding, we need to distribute writes across a set of shards. A common approach is to use a fixed number of shards, say 100. When writing a user’s data, instead of just using userId for the partition key, we’ll use userId#shardId. The shardId can be generated deterministically. A simple modulo operation on the userId works well:
user_id = "user:alice"
num_shards = 100
shard_id = str(hash(user_id) % num_shards)
partition_key = f"{user_id}#{shard_id}"
This distributes writes for user:alice across 100 different partitions, each identified by user:alice#0, user:alice#1, …, user:alice#99.
Reads become slightly more complex. If you know the shardId, you can read directly: GetItem with pk='user:alice#5'. However, if you only know the userId, you’ll need to query across all shards. This is where a Global Secondary Index (GSI) becomes essential.
Let’s create a GSI on userId to facilitate lookups across shards:
{
"TableName": "users",
"IndexName": "userId-index",
"KeySchema": [
{
"AttributeName": "userId",
"KeyType": "HASH"
},
{
"AttributeName": "shardId",
"KeyType": "RANGE"
}
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
When creating the GSI, you’ll need to "project" the userId and shardId attributes. The pk attribute in the main table would be userId#shardId and sk would be profile. In the GSI, userId (the original user ID) becomes the partition key, and shardId becomes the sort key.
A GSI can be projected. To make the GSI work for read-all-shards scenarios, you’ll need to project the pk attribute from the main table into the GSI. Then, you can query the GSI using userId as the partition key, and use pk to reconstruct the full partition key for each shard.
When you query the GSI with userId='user:alice', you get items like:
userId: user:alice,shardId: 0,pk: user:alice#0,sk: profileuserId: user:alice,shardId: 1,pk: user:alice#1,sk: profile…userId: user:alice,shardId: 99,pk: user:alice#99,sk: profile
You then iterate through these results and perform GetItem operations on the main table using the pk and sk.
A common misconception is that using a GSI for reads automatically solves the hot key problem. It doesn’t. While it helps distribute read traffic across partitions if your GSI is also sharded, the primary benefit of write sharding is preventing write hot spots. The GSI is primarily for enabling efficient reads when you don’t know the shard ID.
The number of shards is a crucial tuning parameter. Too few, and you still risk hot partitions. Too many, and you increase the overhead of managing the shards and querying across them. Start with a number that’s a multiple of your expected peak write throughput divided by the maximum WCU per partition (1000 WCU), and adjust based on monitoring.
The real magic of write sharding is that it transforms a single, high-contention partition into many lower-contention partitions. Each userId#shardId combination is a distinct partition. If user:alice is still a hot key, it means one specific shardId for user:alice is seeing disproportionate traffic, which suggests an issue with your hash function or an uneven distribution of access patterns within that specific shard.
The next challenge you’ll face is managing the complexity of querying across shards when you don’t know the shard ID.