DynamoDB global secondary indexes (GSIs) can cost you a fortune if you’re not careful, but there’s a deeply counterintuitive way they can actually save you money by letting you store less data.
Let’s say you have a Products table with a GSI called ByCategoryAndPrice. This GSI has category and price as partition and sort keys, respectively. You use it to quickly find all products within a specific category and price range.
// Products table schema
{
"TableName": "Products",
"KeySchema": [
{"AttributeName": "productId", "KeyType": "HASH"}
],
"AttributeDefinitions": [
{"AttributeName": "productId", "AttributeType": "S"},
{"AttributeName": "category", "AttributeType": "S"},
{"AttributeName": "price", "AttributeType": "N"}
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
// ByCategoryAndPrice GSI schema
{
"IndexName": "ByCategoryAndPrice",
"KeySchema": [
{"AttributeName": "category", "KeyType": "HASH"},
{"AttributeName": "price", "KeyType": "RANGE"}
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
Now, imagine you want to track product reviews. Each review has a reviewId, a productId it belongs to, and a rating. You create a new table, Reviews, to store these.
// Reviews table schema
{
"TableName": "Reviews",
"KeySchema": [
{"AttributeName": "reviewId", "KeyType": "HASH"}
],
"AttributeDefinitions": [
{"AttributeName": "reviewId", "AttributeType": "S"},
{"AttributeName": "productId", "AttributeType": "S"},
{"AttributeName": "rating", "AttributeType": "N"}
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
You can query Reviews by reviewId or productId (if you add a secondary index on productId). But what if you want to find all reviews for a specific product with a rating above 4? You’d need another index on Reviews for productId and rating.
This is where the "sparse index pattern" comes in. Instead of creating a full GSI on Reviews that includes productId and rating for every single review, you can make that GSI sparse.
Let’s say you only care about fetching reviews with a rating of 5. You can create a GSI on Reviews that only includes items where the rating attribute is 5.
// SparseReviewsByProduct GSI schema
{
"IndexName": "SparseReviewsByProduct",
"KeySchema": [
{"AttributeName": "productId", "KeyType": "HASH"},
{"AttributeName": "rating", "KeyType": "RANGE"} // We'll ensure this is only 5
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
The magic happens when you don’t provision the rating attribute for items where the rating is not 5. When you insert or update an item in the Reviews table, you conditionally add the productId and rating attributes to the GSI.
Here’s how you’d insert a review with a rating of 5, ensuring it appears in the GSI:
aws dynamodb put-item \
--table-name Reviews \
--item '{
"reviewId": {"S": "review123"},
"productId": {"S": "productABC"},
"rating": {"N": "5"},
"comment": {"S": "Loved it!"}
}' \
--return-values ALL_NEW
And here’s how you’d insert a review with a rating of 3. Notice that we omit the productId and rating attributes from the GSI projection when the rating isn’t 5. This is the key.
aws dynamodb put-item \
--table-name Reviews \
--item '{
"reviewId": {"S": "review456"},
"productId": {"S": "productABC"},
"rating": {"N": "3"},
"comment": {"S": "It was okay."}
}' \
--return-values ALL_NEW
When you query the SparseReviewsByProduct GSI for productId = "productABC", DynamoDB will only return items where productId and rating are present in the index. In this case, only review123 will be returned because its productId and rating attributes were provisioned for the index. The review with rating = 3 won’t appear in the GSI results.
This approach is brilliant for scenarios where you frequently query based on a subset of your data. For example, if you only ever need to find "top-rated" items (e.g., rating 5), you can create a sparse GSI that only includes those top-rated items.
The cost savings come from two places:
- Storage: Sparse indexes don’t store attributes for items that don’t match the index key conditions. If only 10% of your reviews are 5-star, your sparse GSI will store data for only those 10%, dramatically reducing your storage costs for the index.
- Write Throughput: When you write an item to DynamoDB, you pay for write capacity units (WCUs) for the main table and for each GSI it’s projected into. By making a GSI sparse, you only consume WCUs for the GSI when the item actually matches the index criteria. If you have a high volume of writes but only a small fraction needs to be indexed, you save significant WCU costs.
The core idea is to design your GSI key schema such that the attributes required for your query are only present in the GSI for the subset of items you actually intend to query. You achieve this by selectively adding those attributes during item insertion/update, often using conditional writes in your application logic.
The most overlooked aspect of sparse indexes is that you’re not just saving money by not storing data; you’re also saving money on write throughput for items that don’t match the index. If you have a high write volume and a low match rate for your GSI, this can be a massive cost saver.
The next thing you’ll run into is understanding how to manage complex sparse index designs with multiple conditions or how to translate your application’s filtering needs into effective sparse index strategies.