Cosmos DB’s indexing strategy is a double-edged sword: it makes queries lightning fast, but it can also be a massive drain on your Request Units (RUs) if you’re not careful.
Let’s see this in action. Imagine a typical order document:
{
"id": "ORD12345",
"customerId": "CUST987",
"orderDate": "2023-10-27T10:00:00Z",
"items": [
{
"productId": "PROD567",
"quantity": 2,
"price": 25.50
},
{
"productId": "PROD890",
"quantity": 1,
"price": 100.00
}
],
"shippingAddress": {
"street": "123 Main St",
"city": "Anytown",
"zipCode": "12345"
},
"notes": "Customer requested express shipping."
}
By default, Cosmos DB indexes everything. That means it’s tracking id, customerId, orderDate, every productId, quantity, price within items, and all fields within shippingAddress, plus notes. Now, consider queries. A query like SELECT * FROM c WHERE c.customerId = 'CUST987' is efficient because customerId is indexed. But what if you never, ever query by notes? Or perhaps you have a deeply nested array of audits that grows by the hundreds and you only ever care about the id? Every time you write or update an order, Cosmos DB has to update the index for all those fields, consuming RUs.
The problem this solves is predictable, but often overlooked, RU overconsumption. When you index everything, you’re paying for indexing operations on data you’ll never query. This can lead to hitting your provisioned RU/s throughput limit, causing throttling (429 errors) and impacting application performance. The mental model is that Cosmos DB is a highly optimized key-value store with an added layer of flexible querying. To make querying fast, it builds an index. But this index isn’t just a single table; it’s a multi-dimensional structure that maps values to document locations. When you index a path like /items/[]/price, Cosmos DB creates index entries for each price within the items array for every document.
The exact levers you control are the indexing policy, specifically the includedPaths and excludedPaths arrays. By default, includedPaths typically includes /* (meaning all paths) and excludedPaths is empty. You can override this.
Let’s say you want to exclude the notes field and the entire items array from indexing because you only ever query by orderDate and customerId. You’d modify your container’s indexing policy.
Here’s how you’d do it using the Azure CLI:
First, get your current indexing policy:
az cosmosdb table-throughput list --resource-group myResourceGroup --account-name myCosmosDBAccount --name myContainerName --query indexingPolicy
Then, update the policy. You’ll replace the default includedPaths and add an excludedPaths entry. A common strategy is to include /* initially and then explicitly exclude what you don’t need. However, a more efficient approach is to only include what you do need.
Let’s assume you only need to query by id, customerId, and orderDate. Your updated indexing policy JSON would look something like this (this is a simplified example; your actual policy might have more details like indexing modes):
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{"path": "/id/?", "indexes": [{"kind": "Hash", "precision": -1}]},
{"path": "/customerId/?", "indexes": [{"kind": "Hash", "precision": -1}]},
{"path": "/orderDate/?", "indexes": [{"kind": "Range", "precision": -1}]}
],
"excludedPaths": [
{"path": "/*"}
]
}
Applying this policy means Cosmos DB will only create indexes for id, customerId, and orderDate. Any other field, like items, shippingAddress, or notes, will not be indexed. When you write or update a document, Cosmos DB only performs index updates for the paths specified in includedPaths. This directly reduces the number of index entries generated and updated per write operation, thereby lowering RU consumption for writes.
The one thing most people don’t know is that excludedPaths takes precedence over includedPaths. If you have a broad includedPaths like /* and then a specific excludedPaths like /items/[]/price, the price field within items will not be indexed. This is why the strategy of explicitly listing what you want to include is often clearer and safer than relying solely on exclusions.
The next concept you’ll likely encounter is how to handle composite indexes for multi-field queries that are critical for performance.