Composite indexes are the secret sauce for making ORDER BY queries fly in Cosmos DB, especially when you’re sorting on multiple fields.
Let’s see it in action. Imagine you have a collection of orders and you frequently need to fetch orders for a specific customerId sorted by orderDate in descending order, and then by totalAmount in descending order.
// Sample document
{
"id": "order123",
"customerId": "cust456",
"orderDate": "2023-10-27T10:00:00Z",
"totalAmount": 150.75,
"items": [...]
}
Here’s the query:
SELECT *
FROM c
WHERE c.customerId = "cust456"
ORDER BY c.orderDate DESC, c.totalAmount DESC
Without the right index, Cosmos DB has to do a full collection scan or a query that sorts on the fly. This is incredibly inefficient for large datasets. It’s like trying to find a specific book in a library by pulling every single book off the shelves and then sorting them.
The fix is a composite index. You define it in your index.json file (or via the Azure portal/SDK). For our example, the index.json would look like this:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/customerId/?",
"indexes": [
{
"kind": "Range",
"dataType": "String"
}
]
},
{
"path": "/orderDate/?",
"indexes": [
{
"kind": "Range",
"dataType": "String" // DateTimes are stored as strings
}
]
},
{
"path": "/totalAmount/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number"
}
]
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}
And crucially, for the composite index:
{
"compositeIndexes": [
[
{
"path": "/customerId",
"order": "ascending" // Order specified here doesn't strictly matter for WHERE clause, but good practice to match
},
{
"path": "/orderDate",
"order": "descending" // Matches the ORDER BY clause
},
{
"path": "/totalAmount",
"order": "descending" // Matches the ORDER BY clause
}
]
]
}
After applying this index, Cosmos DB can directly access the relevant data in a pre-sorted order. It’s like the library now has a catalog that shows you exactly where the books for "cust456" are, sorted by date and then amount, without you having to touch a single volume. The query execution plan will show that the index is being used, and latency will drop dramatically.
The order property in the composite index definition (ascending or descending) must match the direction of the ORDER BY clause in your query for it to be utilized. If your query sorts by orderDate ASC and totalAmount DESC, your composite index definition needs to reflect that specific order.
When you define a composite index, Cosmos DB creates a data structure that physically orders documents based on the specified paths. The query engine can then traverse this structure directly to satisfy the ORDER BY clause, avoiding costly in-memory sorting or full scans.
The most surprising thing is that the order of fields in the WHERE clause doesn’t strictly dictate the order in the composite index for filtering. However, the order of fields in the ORDER BY clause absolutely dictates the order in the composite index for sorting. If you have a composite index on /a, /b, /c and your query is WHERE c.a = 1 ORDER BY c.b DESC, c.c DESC, the query will only use the index if c.b and c.c are the first two fields in the composite index definition (in that order and direction) and c.a is also included, potentially as the first field.
The next problem you’ll run into is query performance issues when your ORDER BY clause includes fields not covered by any index, or when the order of fields in the ORDER BY clause doesn’t align with an existing composite index.