Cosmos DB .NET SDK Best Practices for Production Throughput (2026)

The Cosmos DB .NET SDK can be a performance bottleneck if not configured correctly for production throughput, often leading to Request Unit (RU) throttling and poor application responsiveness.

Understanding Throughput in Cosmos DB

Cosmos DB provisions throughput in Request Units (RUs) per second. Every operation (read, write, query) consumes RUs. If your application exceeds its provisioned RU capacity, requests are throttled with a 429 Too Many Requests status code. The .NET SDK has several settings that directly influence RU consumption and how throttling is handled.

Common Throughput Bottlenecks and Solutions

1. Under-provisioned Throughput

This is the most basic cause: you simply don’t have enough RUs allocated for your workload.

Diagnosis: Monitor RU consumption in the Azure portal or using dotnet-monitor. Look for sustained high RU usage close to or exceeding your provisioned limit, and frequent 429 responses in your application logs.
Fix: Increase the provisioned throughput (RUs/sec) for your container or database. For example, to increase a container’s throughput from 400 RUs/sec to 1000 RUs/sec, navigate to your Cosmos DB account in the Azure portal, select "Data Explorer", find your container, go to "Scale & settings", and change the "Throughput (Manual)" value.
Why it works: More RUs means the service can handle more operations concurrently before throttling.

2. Inefficient Queries

Complex or poorly written queries can consume a disproportionate amount of RUs, leading to throttling even with adequate overall provisioned throughput.

Diagnosis: Use the Azure portal’s "Query Performance" blade for your container. It highlights queries consuming the most RUs. Also, log query execution details from the .NET SDK to identify slow or RU-intensive queries.
Fix:
- Indexing: Ensure your indexing policy is optimized for your query patterns. Avoid indexing all properties if you only query a few.
- Query Structure: Rewrite queries to be more efficient. For example, avoid SELECT *, use TOP judiciously, and leverage partition key filters whenever possible.
- Query Metrics: Enable PopulateQueryMetrics in your QueryRequestOptions. This returns detailed RU consumption and execution plan information in the IQueryResponse<T> object, which you can log.
```
var queryDefinition = new QueryDefinition("SELECT * FROM c WHERE c.partitionKey = 'someValue'");
var queryOptions = new QueryRequestOptions { PopulateQueryMetrics = true };
var queryResult = await container.GetItemQueryIterator<MyDocument>(queryDefinition, requestOptions: queryOptions).ReadNextAsync();
// Log queryResult.RequestCharge (RUs consumed) and queryResult.Diagnostics.ToString()
```
Why it works: Optimized queries reduce the number of RUs each operation requires, allowing more operations to fit within your RU budget.

3. Suboptimal Partition Key Strategy

An uneven distribution of data and requests across logical partitions can lead to "hot partitions," where one or a few logical partitions consume all the throughput, even if the total RU provisioned is high.

Diagnosis: Monitor RU consumption per logical partition in the Azure portal’s "Partition key usage" metrics. If one partition consistently shows higher RU usage than others, it’s a hot partition.
Fix: Re-evaluate your partition key. Choose a key with high cardinality and even data distribution. If a hot partition is unavoidable with the current key, consider migrating data to a new container with a better partition key or using a composite partition key.
Why it works: Distributing data and requests evenly across logical partitions ensures that RUs are utilized efficiently across the entire physical partition, preventing bottlenecks.

4. Default `MaxBufferedItemCount`

The MaxBufferedItemCount setting in the FeedOptions (older SDKs) or QueryRequestOptions (v3 SDK) defaults to a relatively low value. When processing large result sets, the SDK might make many individual requests to fetch all items, consuming RUs piecemeal.

Diagnosis: Observe the number of individual ReadItemAsync or ReadNextAsync calls in your application logs when fetching large datasets. If you’re fetching many items and see a high number of individual RU charges, this could be a factor.
Fix: Increase MaxBufferedItemCount to a value that suits your typical result set size. For example, setting it to 1000 allows the SDK to buffer up to 1000 items before issuing a new request.
```
var queryOptions = new QueryRequestOptions { MaxBufferedItemCount = 1000 };
// Use queryOptions with your query iterator
```
Why it works: By buffering more items client-side, the SDK reduces the number of round trips to Cosmos DB, making large result set retrieval more efficient in terms of RU consumption per item.

5. Ineffective Retry Policies

The default retry policy might not be aggressive enough or might be too aggressive for certain scenarios, leading to perceived performance issues or unnecessary retries.

Diagnosis: Monitor the frequency of 429 responses and the number of retries logged by the SDK. The SDK’s diagnostics provide retry information.

Fix: Configure the retry policy using CosmosClientOptions. You can adjust the MaxRetryAttemptsOnThrottledRequests and MaxRetryWaitTimeOnThrottledRequests. For example, to retry up to 10 times with a maximum wait of 30 seconds:

var cosmosClientOptions = new CosmosClientOptions
{
    MaxRetryAttemptsOnThrottledRequests = 10,
    MaxRetryWaitTimeOnThrottledRequests = TimeSpan.FromSeconds(30)
};
var cosmosClient = new CosmosClient(accountEndpoint, accountKey, cosmosClientOptions);

Why it works: A well-tuned retry policy ensures that transient throttling errors are handled gracefully, allowing the application to recover without failing, while also preventing excessive waiting that degrades user experience.

6. Using `QueryRequestOptions.MaxItemCount` Incorrectly

MaxItemCount limits the number of items returned per backend request, not the total number of items fetched. If not combined with MaxBufferedItemCount or proper iteration, it can lead to many small requests.

Diagnosis: When fetching large datasets, observe if you’re making a high number of individual ReadNextAsync calls, each returning a small number of items (e.g., 100 or fewer).

Fix: Set MaxItemCount to a reasonable value, often 100 (the default for MaxItemCount in the v3 SDK), and ensure you are iterating through the FeedIterator until HasMoreResults is false. The SDK will automatically paginate. For very large result sets, increasing MaxBufferedItemCount becomes more impactful.

var queryOptions = new QueryRequestOptions { MaxItemCount = 100 }; // Default for v3 SDK
var iterator = container.GetItemQueryIterator<MyDocument>(queryDefinition, requestOptions: queryOptions);
while (iterator.HasMoreResults)
{
    var response = await iterator.ReadNextAsync();
    // Process response.Resource
}

Why it works: MaxItemCount controls the page size from the server. By iterating correctly, you ensure all data is fetched, and optimizing MaxBufferedItemCount helps manage how much is held client-side between these pages.

7. High Latency Operations

While not directly throughput, high latency operations consume RUs for longer, increasing the chance of hitting RU limits.

Diagnosis: Monitor the duration of your Cosmos DB operations (reads, writes, queries) in your application logs or APM tools.
Fix:
- Proximity: Ensure your application is deployed in the same Azure region as your Cosmos DB account.
- Network: Optimize network paths. For applications outside Azure, consider private endpoints or service endpoints.
- SDK Configuration: Use the latest SDK version, which often includes performance improvements. Enable connection pooling by reusing CosmosClient instances.
```
// Reuse this instance throughout your application's lifetime
var cosmosClient = new CosmosClient("YOUR_COSMOS_DB_CONNECTION_STRING");
```
Why it works: Reduced latency means operations complete faster, freeing up RUs and reducing the overall load on the system.

8. Unnecessary `QueryRequestOptions.PopulatePartitionStatistics`

While useful for debugging, PopulatePartitionStatistics adds overhead and consumes RUs.

Diagnosis: If you’re not actively using partition statistics and still see high RU consumption, especially on read-heavy workloads, this might be a minor contributor.

Fix: Remove PopulatePartitionStatistics = true from your QueryRequestOptions if you don’t need it.

var queryOptions = new QueryRequestOptions { PopulatePartitionStatistics = false }; // Default is false

Why it works: Disabling this option reduces the amount of metadata the server needs to fetch and return, saving RUs.

By systematically addressing these points, you can significantly improve your application’s throughput and stability in production.

The next challenge you’ll likely encounter is managing distributed transactions or ensuring consistency across multiple operations.