DynamoDB Global Secondary Indexes (GSIs) can silently consume a surprising amount of provisioned read and write capacity, often leading to throttling and unexpected costs.
Let’s see this in action. Imagine a Users table with a userId as the partition key and email as the sort key. We provision 1000 RCUs and 1000 WCUs for it.
{
"TableName": "Users",
"KeySchema": [
{ "AttributeName": "userId", "KeyType": "HASH" },
{ "AttributeName": "email", "KeyType": "RANGE" }
],
"AttributeDefinitions": [
{ "AttributeName": "userId", "AttributeType": "S" },
{ "AttributeName": "email", "AttributeType": "S" },
{ "AttributeName": "username", "AttributeType": "S" }
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 1000,
"WriteCapacityUnits": 1000
}
}
Now, let’s add a GSI to query users by username.
{
"TableName": "Users",
"IndexName": "UsernameIndex",
"KeySchema": [
{ "AttributeName": "username", "KeyType": "HASH" }
],
"Projection": {
"ProjectionType": "KEYS_ONLY"
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 100,
"WriteCapacityUnits": 100
}
}
Here’s the tricky part: when you write an item to the Users table, DynamoDB also writes a corresponding item to the UsernameIndex GSI. This means every write to your base table incurs a write cost on the GSI. Similarly, reads against the GSI consume its provisioned capacity independently of the base table.
The problem arises because GSIs, especially those with ALL or KEYS_ONLY projections, can be much larger than your base table if your access patterns involve frequent writes to the base table and reads from the GSI. If a GSI’s provisioned capacity is exceeded, it will throttle requests, and you won’t see the throttling on the base table’s metrics, making it hard to diagnose.
Common Causes and How to Fix Them:
-
High Write Volume to Base Table: Every write to the base table, even if it doesn’t directly involve the GSI’s key attributes, results in a write to the GSI. If your base table is write-heavy, the GSI’s write capacity will be consumed rapidly.
- Diagnosis: Monitor the
WriteThrottleEventsmetric for your GSI in CloudWatch. Compare this to theWriteCapacityUnitsprovisioned for the GSI. - Fix: Increase the provisioned write capacity for the GSI. For instance, if
UsernameIndexis showing throttling, run:
Increaseaws dynamodb update-table --table-name Users --global-secondary-index-updates '[{"Update": {"IndexName": "UsernameIndex", "ProvisionedThroughput": {"ReadCapacityUnits": 100, "WriteCapacityUnits": 500}}}]'WriteCapacityUnitsto match or exceed your observed write traffic to the GSI. This works because you’re directly allocating more throughput to the GSI to handle the load.
- Diagnosis: Monitor the
-
Inefficient GSI Projections: Using
ALLorKEYS_ONLYprojections when only a few attributes are needed for GSI queries can lead to larger GSI items than necessary, increasing storage costs and potentially read/write costs due to larger item sizes.- Diagnosis: Examine the
Projectionattribute of your GSI. Check ifKEYS_ONLYorALLis being used and if all projected attributes are truly required for queries against the GSI. - Fix: Change the projection type to
INCLUDEand specify only the necessary attributes.
This reduces the size of GSI items, lowering the amount of RCU/WCU consumed per read/write operation against the GSI."Projection": { "ProjectionType": "INCLUDE", "NonKeyAttributes": ["firstName", "lastName"] }
- Diagnosis: Examine the
-
"Hot" Partition Key in the GSI: If your GSI’s partition key (e.g.,
usernameinUsernameIndex) has a very uneven distribution of values, a few popular usernames might receive a disproportionate amount of traffic, saturating the capacity allocated to that specific GSI partition.- Diagnosis: Use the
GetItemorQueryoperations with the GSI and observe theConsumedReadCapacityUnitsfor specificusernamevalues. If a few usernames consistently consume high capacity, you have a hot partition. - Fix: Redesign the GSI’s key schema. This might involve adding a sort key to the GSI to distribute load, or creating multiple GSIs with different partition keys if possible, or re-evaluating if the GSI is the right access pattern. There’s no direct command to fix a hot partition key other than re-architecting your access patterns or GSI.
- Diagnosis: Use the
-
High Read Volume Against the GSI: Frequent queries against the GSI, especially those that scan large portions of the index or involve many items, will consume its provisioned read capacity independently of the base table.
- Diagnosis: Monitor the
ReadThrottleEventsmetric for your GSI in CloudWatch. Compare this to theReadCapacityUnitsprovisioned for the GSI. - Fix: Increase the provisioned read capacity for the GSI.
This directly allocates more read throughput to the GSI.aws dynamodb update-table --table-name Users --global-secondary-index-updates '[{"Update": {"IndexName": "UsernameIndex", "ProvisionedThroughput": {"ReadCapacityUnits": 500, "WriteCapacityUnits": 100}}}]'
- Diagnosis: Monitor the
-
"Stale" or Unused GSIs: You might have GSIs that were created for past use cases but are no longer actively queried. However, they still incur write costs for every write to the base table, and potentially read costs if they are accidentally queried.
- Diagnosis: Use CloudWatch metrics to check
ConsumedReadCapacityUnitsandConsumedWriteCapacityUnitsfor each GSI. If a GSI consistently shows zero or very low consumption, it might be unused. - Fix: Delete unused GSIs.
This stops all capacity consumption from the GSI.aws dynamodb delete-table --table-name Users --table-key-schema '[{"AttributeName": "userId", "KeyType": "HASH"}, {"AttributeName": "email", "KeyType": "RANGE"}]' --global-secondary-index-delete '[{"IndexName": "UsernameIndex"}]'
- Diagnosis: Use CloudWatch metrics to check
-
Over-provisioning of the Base Table: Sometimes, the base table itself is over-provisioned, and its metrics might mask the throttling occurring on a smaller, more heavily used GSI.
- Diagnosis: Ensure you’re looking at CloudWatch metrics specifically for the GSI (
IndexName-YourGSIIndexName). If the base table metrics look fine but you’re still getting throttled, the issue is almost certainly within the GSI. - Fix: Re-evaluate the provisioned capacity for the base table based on its actual usage. Once the base table’s capacity is correctly sized, the GSI’s capacity issues become more apparent.
- Diagnosis: Ensure you’re looking at CloudWatch metrics specifically for the GSI (
The most common pitfall is not realizing that GSIs have their own, independent capacity and that writes to the base table always incur writes to the GSI, regardless of whether the GSI’s key attributes are updated.
You’ll hit throttling errors on your GSI queries, like ProvisionedThroughputExceededException.