Cut Cosmos DB Costs: RU Optimization, TTL, and Autoscale (2026)

Cosmos DB’s Request Unit (RU) model is less about throughput and more about a complex, internally-weighted cost of operations.

Let’s see it in action. Imagine a simple GET request for a single document in a collection. This might cost 1 RU. Now, a POST to create a document? That’s 2 RU. A PUT to update it? 3 RU. But what if that PUT involves a complex query that scans 100 documents? Suddenly, we’re talking 100 RU, not just for the update, but for the scan itself. And if that scan involves indexing? Even more RUs.

The problem Cosmos DB solves is providing a globally distributed, multi-model database with guaranteed low latency and high availability. It achieves this by abstracting away the underlying hardware and exposing a consistent API. The "cost" of these operations, however, is measured in Request Units (RUs). Every database operation—reads, writes, queries, even internal background tasks like indexing—consumes RUs. You provision RUs for your container (or database), and if you exceed that provision, your requests get throttled. The trick is that RUs aren’t a linear measure of "work"; they’re an internally defined metric that accounts for CPU, memory, and I/O. This means a simple read might be cheap, but a complex query that scans many documents, even if it returns only one, can be surprisingly expensive.

To optimize costs, we need to understand RU consumption and actively manage it.

RU Optimization: Query Tuning

The biggest culprit for RU waste is inefficient queries. A query that scans a large number of documents to find a few is a RU-guzzler.

Diagnosis: Use the Azure portal’s "Performance" blade for your container. Look at the "Query Metrics" tab. Identify queries with high avg. RU per operation or avg. document read values.

Fix: Rewrite your queries to be more selective. For example, instead of SELECT * FROM c WHERE c.partitionKey = 'someValue', if you know the specific document ID, use SELECT TOP 1 * FROM c WHERE c.partitionKey = 'someValue' AND c.id = 'specificDocId'. This dramatically reduces the number of documents scanned.

Why it works: By adding more filters, especially ones that target specific documents or a smaller subset, you instruct the query engine to read fewer documents, thus consuming fewer RUs.

Diagnosis: If you’re using SELECT *, you’re likely reading more data than you need.

Fix: Explicitly select only the properties you require. Change SELECT * FROM c WHERE c.category = 'electronics' to SELECT c.name, c.price FROM c WHERE c.category = 'electronics'.

Why it works: Cosmos DB charges RUs based on the amount of data processed, not just returned. Fetching only necessary fields reduces the data payload, lowering RU consumption.

Diagnosis: Suboptimal indexing strategy can lead to scans even when you think you’re being selective.

Fix: Review your indexing policy. If you frequently query on a specific property, ensure it’s indexed. However, avoid indexing everything. For example, if you only query by userId, don’t index timestamp unless you have specific range queries on it. Use indexingMode: "consistent" for most scenarios.

Why it works: Indexes allow Cosmos DB to quickly locate relevant documents without scanning the entire collection, drastically reducing RU cost for queries that can leverage them.

Time to Live (TTL) for Automatic Data Cleanup

Keeping data indefinitely, even if rarely accessed, incurs RU costs for storage and potentially for background processes.

Diagnosis: Check the "Data" tab for your container. If you have documents that are no longer needed after a certain period (e.g., logs, audit trails, temporary session data), they are unnecessarily consuming resources.

Fix: Enable TTL on your container or specific items. On the container level, set ttl to a value in seconds (e.g., 3600 for 1 hour). For item-level TTL, add a _ts property and a ttl property to your JSON document. For instance, to expire a document after 24 hours from its creation:

{
  "id": "log-entry-123",
  "message": "User logged in",
  "_ts": 1678886400, // Unix timestamp of creation
  "ttl": 86400      // 24 hours in seconds
}

(Note: _ts is automatically managed by Cosmos DB for item-level TTL and is the document’s last modified timestamp in seconds since the epoch.)

Why it works: When TTL is enabled, Cosmos DB automatically deletes expired documents. This process consumes RUs, but it’s typically far less than the ongoing cost of storing and potentially querying that stale data. The RU cost for TTL deletion is factored into the overall RU budget, and it’s generally a net positive for cost savings.

Autoscale Throughput Provisioning

Manually setting RU/s can lead to over-provisioning during low-traffic periods and under-provisioning during peaks.

Diagnosis: Monitor your container’s "Scale & Settings" blade. Observe the "Throughput" graph. If you consistently see Actual RU/s far below Provisioned RU/s during off-peak hours, you’re overpaying. Conversely, if you frequently hit Max RU/s (throttling) during peak hours, your manual setting is too low.

Fix: Enable Autoscale throughput. In the Azure portal, navigate to your container’s "Scale & Settings" blade. Under "Throughput," select "Autoscale." Set the "Maximum RU/s" to your peak requirement. For example, if your peak is 4000 RU/s, set the maximum to 4000. Cosmos DB will then scale between 10% of the maximum (400 RU/s in this example) and the maximum, automatically adjusting based on your workload.

Why it works: Autoscale automatically adjusts your provisioned RU/s between 10% of the maximum and the maximum value you set, based on your actual consumption. This ensures you have enough throughput during busy times and pay only for what you use during quieter periods, avoiding both throttling and over-provisioning.

The next error you’ll hit is likely related to indexing policies on newly added properties if you’re not careful about schema evolution.