Detect and Fix Hot Partitions in Cosmos DB (2026)

Cosmos DB’s request units (RUs) are your currency, and a hot partition is where you’re spending way too much, too fast.

Here’s how a hot partition manifests: your application experiences intermittent, high latency requests, and you see 429 Too Many Requests errors in your Cosmos DB logs, specifically with the message {"code":429,"message":"The request rate is too large. Please retry after a backoff period.","error":{...}}. The underlying issue is that a single logical partition within your container is receiving a disproportionate amount of traffic (reads or writes), exceeding its allocated throughput, even if the overall container throughput seems sufficient. Cosmos DB distributes your data across physical partitions, but if one logical partition lands on a single physical partition, and that physical partition hits its RU limit, everything on it suffers.

Common Causes and Fixes for Hot Partitions

Uneven Data Distribution due to Poor Partition Key Choice:
- Diagnosis: The most common culprit. If your partition key has low cardinality (few distinct values) or frequently accessed "hot" values, a disproportionate amount of data and requests will land on the same physical partition. For example, using UserId as a partition key when a few highly active users dominate your traffic.
- Command/Check:
  - Azure Portal: Navigate to your container, go to "Metrics," select "Data Usage by Partition Key" and "Request Units by Partition Key." Look for extreme spikes in RU consumption or data size for specific partition key values.
  - Azure CLI/PowerShell: Querying this directly is less straightforward. You’d typically rely on portal metrics or sample your data to infer distribution.
- Fix: Re-architect your data model. Choose a partition key with high cardinality and a uniform distribution of your workload. For instance, if UserId is hot, consider a composite key like UserId_Date or a hash of UserId if your query patterns allow. This involves migrating data to a new container with a better partition key.
- Why it works: A well-chosen partition key distributes data and requests evenly across multiple physical partitions, preventing any single one from becoming a bottleneck.
Sequential Partition Keys with Burst Traffic:
- Diagnosis: Even with a good partition key, if your key values are sequential (e.g., timestamps, sequential IDs) and you have a sudden burst of writes or reads targeting a specific, recent value, you can overload the physical partition hosting that value.
- Command/Check: Similar to cause #1, monitor "Request Units by Partition Key" in the Azure portal during peak traffic. Look for spikes in RU consumption correlated with specific, sequential partition key values.
- Fix: Implement a "salting" or "hashing" strategy for your partition key. For example, if using EventTimestamp and seeing hot partitions for recent events, prepend a random character or a hash of a user ID to the partition key value. This shuffles data across physical partitions.
- Why it works: Salting breaks the sequentiality, forcing new data or requests to land on different physical partitions, distributing the burst load.
Large Number of Requests to a Single Item:
- Diagnosis: A single item within a logical partition is being accessed excessively, pushing its host physical partition over its RU limit. This can happen if an item acts as a global counter or a frequently updated configuration.
- Command/Check: Use the "Request Units by Partition Key" metric. If one logical partition key value shows extremely high RU consumption, investigate the items within that logical partition. You might need application-level logging to pinpoint the specific item.
- Fix:
  - Denormalization: If the item is frequently read, denormalize its data into other documents that are read more often.
  - Caching: Implement client-side or external caching (e.g., Redis) for frequently accessed, less volatile data.
  - Batching (Writes): If the item is frequently updated, batch these updates if possible, or reconsider the design if it’s a single point of contention.
- Why it works: Reduces the direct RU load on the hot item by serving it from cache or distributing its data, or by reducing the frequency of direct RU-consuming operations.
Inefficient Queries:
- Diagnosis: A query that, while seemingly targeting a small set of data, is inefficiently written and consumes a high number of RUs per request. This could be a cross-partition query that was intended to be within-partition, or a query with a broad WHERE clause on a non-indexed property.
- Command/Check:
  - Azure Portal: Go to "Performance" -> "Query Insights." Analyze the "Slowest Queries" and "Most Expensive Queries" sections. Look for queries with high average RUs.
  - Azure-Cosmos-Request-Charge Header: Inspect the x-ms-request-charge header in your application’s responses for specific API calls. A consistently high value for a particular query indicates inefficiency.
- Fix: Optimize your queries. Ensure your query’s WHERE clause filters on the partition key. Use indexes effectively. Avoid SELECT * and only retrieve necessary fields. For cross-partition queries, consider if they can be refactored or if a different data model is needed.
- Why it works: Efficient queries consume fewer RUs per operation, reducing the overall load on the physical partition.
High Throughput Provisioning on a Small Number of Physical Partitions:
- Diagnosis: Cosmos DB automatically scales the number of physical partitions based on your data size and provisioned throughput. If you have provisioned a very high throughput (e.g., 100,000 RUs) but have a relatively small amount of data, Cosmos DB might not have created enough physical partitions to distribute that throughput effectively. All RUs might be concentrated on a few physical partitions.
- Command/Check:
  - Azure Portal: Check "Scale & Settings" for your container. Look at the "Partition key ranges" count. If this count is low (e.g., < 10) but your RUs are high, this could be the issue.
  - x-ms-cosmos-total-request-units Header: While this shows total available RUs, it doesn’t directly show distribution.
- Fix:
  - Increase Throughput (if Autoscale): If using autoscale, let it scale up. This usually triggers more physical partitions.
  - Manual Throughput Adjustment: If manually provisioned, temporarily increase throughput to a much higher value (e.g., 200,000 RUs) to force Cosmos DB to create more physical partitions. Once more partitions are available, you can scale back down to your desired RUs.
  - Data Ingestion: Ingest more data. As data grows, Cosmos DB automatically splits physical partitions.
- Why it works: More physical partitions mean the total provisioned RUs are spread across a larger number of independent RUs pools, reducing the chance of any single partition being overloaded.
Stale Partition Key Range Cache:
- Diagnosis: Occasionally, the metadata cache for partition key ranges can become stale. This can lead to requests being routed to an incorrect or overloaded physical partition.
- Command/Check: This is hard to diagnose directly without deep system access. It’s often a transient issue. If you see intermittent hot partitions without a clear cause from the above, and the issue resolves itself after a few minutes, this might be a factor.
- Fix: There’s no direct user-facing fix. However, ensuring your SDKs are up-to-date and implementing proper retry policies with exponential backoff can help your application gracefully handle these transient routing issues.
- Why it works: The retry mechanism allows the client to re-attempt the request, potentially after the metadata cache has been refreshed by Cosmos DB.

After fixing hot partitions, the next error you’ll likely encounter is a 408 Request Timeout if your application’s retry logic isn’t aggressive enough for truly unavailable services, or a 503 Service Unavailable if Cosmos DB itself is experiencing broader resource contention.