DynamoDB Global Secondary Indexes (GSIs) are not just for querying by a different attribute; they are distinct, independently scalable tables that allow you to serve entirely different access patterns from the same underlying data.
Let’s say you have a Products table with a primary key of ProductID (partition key) and SKU (sort key). Your main access pattern is retrieving a product by its ProductID and SKU.
// Example Product Item
{
"ProductID": "XYZ789",
"SKU": "XYZ789-RED-L",
"Name": "Crimson T-Shirt",
"Color": "Red",
"Size": "L",
"Price": 25.99,
"InventoryCount": 150
}
Now, imagine you need to query products by Color and Size, or by Name. You can create GSIs for these.
First, a GSI to query by Color and Size. We’ll call it ColorSizeIndex.
- Partition Key:
Color - Sort Key:
Size
// GSI Projection for ColorSizeIndex
{
"ProjectionType": "ALL", // Project all attributes for simplicity here
"NonKeyAttributes": [ "ProductID", "SKU", "Name", "Price", "InventoryCount" ]
}
Second, a GSI to query by Name. We’ll call it NameIndex.
- Partition Key:
Name
// GSI Projection for NameIndex
{
"ProjectionType": "KEYS_ONLY", // Only project keys if you only need to check existence or get the primary key
"NonKeyAttributes": [ "ProductID", "SKU" ] // Or project specific attributes
}
When you create these GSIs, DynamoDB physically creates separate data structures. A write to your main Products table is replicated to each GSI that includes the attributes being written, according to the GSI’s projection.
Here’s how you might query them using the AWS CLI:
To get all "Red" "L" size products:
aws dynamodb query \
--table-name Products \
--index-name ColorSizeIndex \
--key-condition-expression "Color = :c AND Size = :s" \
--expression-attribute-values '{
":c": {"S": "Red"},
":s": {"S": "L"}
}'
To find products named "Crimson T-Shirt":
aws dynamodb query \
--table-name Products \
--index-name NameIndex \
--key-condition-expression "Name = :n" \
--expression-attribute-values '{
":n": {"S": "Crimson T-Shirt"}
}'
The key insight is that each GSI is a separate table with its own provisioned throughput (or on-demand capacity) and its own set of data. You can project all attributes, only the keys, or a specific subset of attributes. Projecting only what you need (sparse indexes) can save on storage and write costs, especially for large items.
The real power comes from understanding that you can have multiple GSIs on the same table, each serving a unique access pattern, allowing for a highly flexible data model. For example, you could have a GSI for searching by Category, another for Brand, and yet another for PriceRange (though range queries on GSIs require careful design, often involving composite sort keys).
When designing, consider the read patterns first. What questions do you need to ask of your data that the primary key doesn’t support? Each of those questions likely maps to a GSI. For writes, understand that each GSI adds latency and cost. A write operation must be applied to the base table and then asynchronously replicated to each GSI. This replication is eventually consistent.
The most common mistake is to try and cram too many access patterns into a single GSI by using complex sort keys or overloaded partition keys. Instead, think of each distinct query pattern as a potential candidate for its own GSI. You can have up to 20 GSIs per table.
If you project only a subset of attributes to a GSI, and then need an attribute that wasn’t projected when you query that GSI, DynamoDB will perform a table fetch behind the scenes to retrieve the missing attributes from the base table. This adds latency and consumes read capacity from the base table, so it’s usually better to project all attributes needed for a given GSI query.
The next challenge is understanding and managing the cost implications of GSIs, particularly with high write volumes and large projected attributes.