Monitor Cosmos DB RU Consumption, Latency, and Errors (2026)

Cosmos DB’s Request Units (RUs) are not a cost metric, but a measure of normalized resource consumption, and you can easily exceed your provisioned throughput if you’re not watching them.

Let’s see what a typical RU consumption pattern looks like.

Imagine a simple API endpoint that reads a single document from Cosmos DB.

// Express.js route handler
app.get('/items/:id', async (req, res) => {
  const { id } = req.params;
  try {
    const item = await container.item(id, id).read(); // Container.read(id, partitionKey)
    console.log(`Read item ${id}. RU consumed: ${item.requestCharge}`);
    res.json(item.resource);
  } catch (error) {
    console.error(`Error reading item ${id}: ${error.code} - ${error.message}`);
    res.status(error.code).send(error.message);
  }
});

When this endpoint is called, container.item(id, id).read() executes. The id is used as both the item ID and the partition key in this example. Cosmos DB processes this read request. The item.requestCharge property, which is returned in the response, tells you exactly how many RUs that single operation consumed. For a point read on a document with a size of 1KB, this is typically 1 RU.

Now, let’s say we have 100 concurrent users hitting this endpoint. If each user requests a different item, and each item is 1KB, we’d expect a total RU consumption of 100 RUs. If our container is provisioned at 400 RU/s, we’re well within limits.

However, what if the document size is larger, or we’re performing more complex operations like queries? A query to find all items in a specific category might consume significantly more RUs.

// Example query: Find all items in category 'electronics'
const querySpec = {
  query: "SELECT * FROM c WHERE c.category = 'electronics'",
  parameters: []
};

const { resources, requestCharge } = await container.items.query(querySpec).fetchAll();
console.log(`Query executed. RU consumed: ${requestCharge}`);

The requestCharge for this query will depend on several factors: the number of documents scanned, the number of documents returned, and the complexity of the query itself. It’s not uncommon for a single query to consume tens or even hundreds of RUs.

Understanding RU consumption is crucial for managing performance and cost. High RU consumption can lead to throttling (HTTP 429 errors), increasing latency and impacting your application’s responsiveness.

The core problem this solves is predictable and scalable throughput for a NoSQL database. Cosmos DB abstracts away the underlying hardware and provides a consistent API and performance guarantees through RUs. You provision RUs, and Cosmos DB ensures your operations meet that throughput. Your job is to monitor how your application uses those RUs.

The mental model is that each operation you perform against Cosmos DB has a "cost" in RUs. This cost is determined by the type of operation (read, write, query), the size of the data involved, and the number of documents processed. Your provisioned throughput (RU/s) is your budget. Spend too much, too fast, and you get throttled.

Here are the levers you control:

Provisioned Throughput (RU/s): This is the most direct lever. You set this at the database or container level. You can manually set it or use autoscale.
Partitioning Strategy: A good partition key distributes your workload evenly across logical partitions, preventing hot partitions that can consume disproportionate RUs and lead to throttling even if your total RU consumption is below your provisioned limit.
Indexing Policy: By default, Cosmos DB indexes all properties. This is convenient but adds RU overhead to writes. For specific workloads, you might exclude certain paths or properties from indexing to reduce write costs.
Query Optimization: Writing efficient queries that scan fewer documents and return only necessary data is critical. Using TOP and ORDER BY clauses carefully, and ensuring your query filters align with your partition key, can drastically reduce RU costs.
Batching: For multiple small writes, using the executeBatch API can be more efficient than individual upsert or create operations, as it can reduce the overall RU cost for the batch of operations.

The one thing most people don’t realize is that the requestCharge is per operation. When you make a single API call, like container.items.query(querySpec).fetchAll(), the requestCharge you see is the total for that entire query, not a per-document cost. This aggregates the cost of scanning, filtering, and returning results, making it essential to track the total for complex operations.

The next concept to explore is how to proactively manage RU consumption using autoscale and custom alerts.