Optimize Cosmos DB SQL API Queries to Cut RU and Latency (2026)

Cosmos DB’s SQL API is designed for performance, but hitting those aggressive latency and RU (Request Unit) targets often feels like wrestling with a black box. The key isn’t just writing SQL; it’s understanding how Cosmos DB executes that SQL and how to nudge it in the right direction.

Let’s see it in action. Imagine a simple query to get a user by their ID:

SELECT VALUE r FROM r WHERE r.id = 'user-123' AND r.partitionKey = 'pk-abc'

When this hits Cosmos DB, it doesn’t just scan everything. It uses an internal query engine. For this specific query, if id and partitionKey are indexed, Cosmos DB can go directly to the relevant data partition and then use the index to pinpoint the exact document. This is highly efficient.

But what happens when we get more complex?

SELECT * FROM c WHERE STARTSWITH(c.name, 'Alice') AND c.age > 30

Here, Cosmos DB needs to scan documents within the relevant partition(s). If name is indexed, it can use that to find documents starting with "Alice." Then, it filters those results by age > 30. If age isn’t indexed, it has to do a full scan of the documents that matched the name filter. This is where RU costs can balloon and latency creeps up.

The core problem Cosmos DB solves is providing a globally distributed, multi-model database with predictable performance. The SQL API abstracts away much of the underlying complexity, but to optimize, you need to peek under the hood. Each query has an associated RU cost, which is a measure of the compute, memory, and I/O resources consumed. High RU consumption means higher costs and potential throttling if your provisioned throughput is exceeded. Latency is the time it takes for the query to return.

The levers you control are primarily:

Indexing Policy: What fields are indexed and how.
Partition Key Strategy: How your data is distributed.
Query Structure: How you write your SQL.
Throughput Provisioning: How many RUs you allocate.

Let’s dive deeper into query structure and indexing.

Consider a query like:

SELECT VALUE COUNT(r) FROM r WHERE r.status = 'processed'

If status is not indexed, Cosmos DB will scan every document in the query’s scope (which could be an entire partition or multiple partitions) to count the ones matching status = 'processed'. This is extremely expensive.

The fix: Ensure status is included in your indexing policy.

{
    "indexingMode": "consistent",
    "automatic": true,
    "includePaths": [
        {
            "path": "/*",
            "kind": "Include"
        }
    ],
    "excludePaths": [
        {
            "path": "/content/?",
            "kind": "Exclude"
        }
    ]
}

By default, Cosmos DB indexes all properties. If you’ve explicitly excluded status, you need to change it. If you’ve included /*, it should be indexed. The mechanical benefit here is that instead of scanning documents, Cosmos DB can read the index entries for status, which are far more compact, and count them directly.

Another common pitfall: using functions on indexed fields in the WHERE clause.

SELECT * FROM c WHERE UPPER(c.email) = 'TEST@EXAMPLE.COM'

If email is indexed, but you apply UPPER(), the index on email (which stores test@example.com) cannot be directly used to match 'TEST@EXAMPLE.COM'. Cosmos DB has to retrieve documents and then apply the function, leading to a scan.

The fix: Normalize your data or perform the transformation on the client-side before the query, or ensure your query uses the exact indexed value. For example, if your data is always lowercase, query with a lowercase string:

SELECT * FROM c WHERE c.email = 'test@example.com'

If you must use a function, and the field is indexed, Cosmos DB might still be able to optimize if it has specific internal knowledge of the function (e.g., STARTSWITH is often optimized). However, generic functions like UPPER, LOWER, SUBSTRING on the indexed field itself will usually bypass the index for the comparison.

The most surprising thing about Cosmos DB query optimization is that the SELECT * is often less efficient than SELECT VALUE r FROM r or SELECT r.id, r.name FROM r. When you use SELECT *, Cosmos DB has to retrieve the entire document payload. If your query only needs a few fields, Cosmos DB can often satisfy the request by reading only the index entries for those specific fields, especially if they are included in a covering index. This drastically reduces the amount of data read from storage, directly lowering RU consumption.

For example, if you only need the id and name of users whose status is 'active', instead of:

SELECT * FROM c WHERE c.status = 'active'

Use:

SELECT c.id, c.name FROM c WHERE c.status = 'active'

If status is indexed, and you’ve configured your indexing policy to include id and name (which they are by default unless excluded), Cosmos DB can satisfy this query entirely from the index. It doesn’t need to touch the actual document data at all. This is called a "covering query" and is the holy grail of RU optimization. The RU cost for this query would be significantly lower than the SELECT * version because the I/O is minimal.

Finally, remember that your partition key strategy is paramount. Queries that can hit a single partition (using the partition key in the WHERE clause) are orders of magnitude more efficient than those that fan out across many. If your query doesn’t specify a partition key, or if it uses a value that exists in multiple partitions, Cosmos DB has to execute the query against all relevant partitions, dramatically increasing RU cost and latency.

If you’ve optimized your indexing and query structure, but still see high RU, the next error you’ll hit is 429 Too Many Requests throttling.