Cosmos DB’s indexing policy isn’t just about what gets indexed, but how it’s indexed, and that has a massive impact on Request Units (RUs) consumed.
Let’s see it in action. Imagine we have a container orders with a partitionKey of /customerId and we’re running a query:
SELECT * FROM c WHERE c.customerId = 'customer123' AND c.orderStatus = 'shipped'
Without proper indexing, Cosmos DB might have to scan every single document in the orders container for customer123 and then filter for orderStatus = 'shipped'. Each document read costs RUs. If we have millions of orders for customer123, this can get expensive fast.
The default indexing policy in Cosmos DB is often automatic, meaning it tries to index everything. This is convenient but can lead to indexing too much, especially for properties you don’t frequently query.
Here’s the default indexing policy for a new container:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/content/\\.*"
}
]
}
The automatic: true and includedPaths: "/*" means everything is being indexed.
The problem this solves is the RU cost associated with inefficient queries and the storage overhead of unnecessary indexes. By tuning the indexing policy, we tell Cosmos DB exactly what to index, and more importantly, what not to index, leading to faster queries and lower RU consumption.
The core concept is that for every write operation (create, update, delete), Cosmos DB updates its indexes. If you’re indexing every property on every document, even properties you never query, you’re paying RU costs on writes for those indexes. For queries, if an index exists that can satisfy the query’s filter and sort clauses, Cosmos DB will use it, avoiding costly full scans.
Let’s say we only query orderStatus and orderDate for a specific customerId. We can define a more specific indexing policy.
First, we need to disable automatic indexing and define our own. We’ll include the partition key path, which is always necessary for efficient querying across partitions.
{
"indexingMode": "consistent",
"automatic": false, // Turn off automatic indexing
"includedPaths": [
{
"path": "/customerId/?" // Index the partition key
},
{
"path": "/orderStatus/?" // Index orderStatus
},
{
"path": "/orderDate/?" // Index orderDate
}
],
"excludedPaths": [
{
"path": "/*" // Exclude everything else by default
}
]
}
The /? in the path signifies that we want to index the property as a range index (for sorting and range queries) and a equality index (for exact matches). For a simple equality check like c.orderStatus = 'shipped', the equality index is sufficient and slightly more efficient. If we also needed to sort by orderStatus (e.g., ORDER BY c.orderStatus), we’d need the range index.
After applying this policy, our previous query SELECT * FROM c WHERE c.customerId = 'customer123' AND c.orderStatus = 'shipped' would be much more efficient. Cosmos DB can use the index on customerId to find all documents for customer123 and then use the index on orderStatus to quickly filter for 'shipped' documents within that subset.
Crucially, for queries that don’t involve customerId, orderStatus, or orderDate in their WHERE clause, Cosmos DB will now have to perform a full scan because those properties are not indexed. This is the trade-off: you pay less on writes and for targeted reads, but potentially more for broad, unindexed scans.
The indexingMode can also be lazy or off. lazy means indexes are updated asynchronously after writes, reducing write RU costs but potentially making queries that rely on recently written data return stale results until the index is updated. off disables indexing entirely, only useful for containers where you only perform point reads or queries that can be satisfied by the partition key alone and never need to filter or sort on other properties.
The one thing most people don’t realize is that you can define composite indexes. If you frequently query on multiple properties together, like WHERE c.customerId = 'customer123' AND c.orderStatus = 'shipped' AND c.orderDate >= '2023-01-01', a composite index can be even more performant than separate indexes.
{
"indexingMode": "consistent",
"automatic": false,
"includedPaths": [
{
"path": "/customerId/?"
},
{
"path": "/orderStatus/?"
},
{
"path": "/orderDate/?"
}
],
"excludedPaths": [
{
"path": "/*"
}
],
"compositeIndexes": [
[
{ "path": "/customerId", "order": "ascending" },
{ "path": "/orderStatus", "order": "ascending" },
{ "path": "/orderDate", "order": "ascending" }
]
]
}
This composite index allows Cosmos DB to satisfy the multi-property WHERE clause by traversing a single, optimized index structure, minimizing the number of document reads.
The next step after optimizing your indexing policy is understanding how to leverage computed properties.