Understand Cosmos DB Request Units and Plan Capacity (2026)

Cosmos DB Request Units (RUs) aren’t a direct measure of CPU, memory, or network, but rather a normalized abstraction of all the resources a database operation consumes.

Let’s see this in action. Imagine we have a simple az cli command to read an item from a Cosmos DB container.

az cosmosdb sql item show \
    --account-name mycosmosdbaccount \
    --resource-group myresourcegroup \
    --database-name mydatabase \
    --container-name mycontainer \
    --item-id myitemid \
    --partition-key-value "somevalue"

When this command executes, Cosmos DB doesn’t just "do a lookup." It performs a series of internal operations: fetching the data from storage, applying any filters or projections, serializing the response, and sending it back over the network. Each of these steps, and many more, have a cost. Cosmos DB bundles all these costs into a single, abstract unit: the Request Unit.

This abstraction is crucial because it allows Cosmos DB to operate across a heterogeneous mix of hardware and storage technologies while presenting a consistent performance model. A read operation that costs 10 RUs will always cost 10 RUs, regardless of whether it’s running on a dusty old server in Seattle or a shiny new one in Dublin. The service then guarantees that a provisioned throughput of, say, 400 RUs/sec means you can execute operations that sum up to 400 RUs within any given second.

The core problem RUs solve is providing a predictable and scalable performance metric for a distributed, multi-tenant database service. Without them, users would need to understand the underlying hardware, network latency, and specific implementation details of every operation, which is impractical for a cloud service. RUs abstract this complexity, allowing you to focus on throughput rather than resource allocation.

Understanding RUs means understanding how they are consumed and how capacity is planned.

RU Consumption:

Read Operations: Simple GET operations on items consume a baseline number of RUs. The cost increases slightly if you need to read specific fields (projections) or if the item is very large (up to 2MB per item).
Write Operations: Creating, updating, or deleting items are more expensive than reads. They involve not only writing data but also updating indexes and ensuring consistency across the distributed system.
Queries: This is where RU consumption can vary wildly. A query that scans an entire partition will consume significantly more RUs than a query that can use an index to pinpoint a specific item. The number of documents processed, the complexity of the query (joins, aggregations, UDFs), and the efficiency of the indexes all play a role.
Indexing: Cosmos DB automatically indexes all data. While this is a huge benefit for query performance, maintaining these indexes incurs an RU cost on write operations.
Consistency Levels: Higher consistency levels (e.g., Strong, Bounded Staleness) require more network hops and coordination, leading to higher RU consumption for read operations compared to lower levels like Eventual or Session.

Capacity Planning:

You have two primary models for managing RU capacity in Cosmos DB:

Manual Throughput: You provision a fixed number of RUs per second for a container or a database. This is ideal for predictable workloads.
- Example: To provision 1000 RUs/sec for a container using az cli:
```
az cosmosdb sql container create \
    --account-name mycosmosdbaccount \
    --resource-group myresourcegroup \
    --database-name mydatabase \
    --name mycontainer \
    --partition-key-path "/mypartitionkey" \
    --throughput 1000
```
- Why it works: This dedicates 1000 RUs to your container, ensuring that your operations will not be throttled as long as their total RU consumption per second stays below this limit.
Autoscale Throughput: You set a maximum RU/sec for your container or database, and Cosmos DB automatically scales the provisioned throughput up and down within a defined range (e.g., 10% to 1000% of the minimum provisioned RU/sec). This is excellent for spiky or unpredictable workloads.
- Example: To configure autoscale with a maximum of 4000 RUs/sec (which scales between 400 and 4000 RUs/sec) for a container using az cli:
```
az cosmosdb sql container update \
    --account-name mycosmosdbaccount \
    --resource-group myresourcegroup \
    --database-name mydatabase \
    --name mycontainer \
    --max-throughput 4000
```
- Why it works: Cosmos DB monitors your actual RU consumption. If it consistently hits your minimum (e.g., 400 RUs), it scales up towards the maximum (4000 RUs) to accommodate demand, and scales down when demand drops to save costs.

The key to cost optimization and performance is understanding your actual RU consumption. You can monitor this via the Azure portal metrics (e.g., "Data consumed by requests" in RU/sec) or by examining the x-ms-request-charge header in your API responses. This header tells you exactly how many RUs a single operation consumed.

The most surprising thing about RU planning is how often people over-provision, especially for read-heavy workloads on indexed data, because they conflate "query complexity" with "high RU cost." A query that looks complex, involving multiple filters and sorting, might actually be very cheap if it can efficiently use a composite index to pinpoint just a few documents. Conversely, a seemingly simple query that scans a large partition, even if it returns few results, can be extremely expensive.

The next concept you’ll likely encounter is how to optimize queries and indexing strategies to minimize RU consumption for a given workload.