Cosmos DB doesn’t actually distribute data evenly across partitions if your partition key isn’t well-chosen, leading to "hot partitions" that bottleneck your throughput.

Let’s see this in action. Imagine a Cosmos DB container storing e-commerce orders, with a partition key of customerId.

{
  "id": "order123",
  "customerId": "custA",
  "orderDate": "2023-10-27T10:00:00Z",
  "items": [
    {"productId": "prodX", "quantity": 2},
    {"productId": "prodY", "quantity": 1}
  ],
  "totalAmount": 150.75
}

If custA is your most active customer, all their orders will land on the same physical partition. When custA makes a purchase, or you query their orders, that single partition will absorb all the Request Units (RUs) and potentially become a hot spot.

This problem arises because Cosmos DB maps your logical partition keys to physical partitions. A good partition key distributes requests and storage evenly across these physical partitions. A bad one, like a customerId in a scenario with a few very active customers, concentrates everything onto one or a few physical partitions.

The solution is to use a hierarchical partition key. Instead of just customerId, you can combine it with another attribute that adds more granularity. For e-commerce, a common and effective choice is orderDate or a derived date component.

Let’s say we choose to partition by orderDate at a daily granularity, and then within that, by customerId. Cosmos DB allows you to define a composite partition key.

Here’s how you’d configure it in the Azure portal or via ARM/Bicep:

{
  "partitionKey": {
    "paths": [
      "/orderDateYearMonthDay",
      "/customerId"
    ],
    "kind": "Hash"
  }
}

You’d need to ensure your documents have a field like orderDateYearMonthDay which you populate with a value like 2023-10-27. This is often done in your application code before inserting the document.

{
  "id": "order123",
  "customerId": "custA",
  "orderDate": "2023-10-27T10:00:00Z",
  "orderDateYearMonthDay": "2023-10-27", // New field for partitioning
  "items": [
    {"productId": "prodX", "quantity": 2},
    {"productId": "prodY", "quantity": 1}
  ],
  "totalAmount": 150.75
}

Now, requests for custA on 2023-10-27 will go to a specific physical partition associated with that date and customer. If custB also places orders on 2023-10-27, they will likely land on a different physical partition because the first part of the composite key (orderDateYearMonthDay) is different. Even if custA is active, their requests are now spread across different physical partitions based on the date.

The key here is that Cosmos DB hashes the combination of the values in the paths array to determine the physical partition. The first path (/orderDateYearMonthDay) acts as a broad distribution mechanism, and the second path (/customerId) adds further distribution within each date partition.

This approach effectively breaks up a single hot partition for an extremely active customer into multiple partitions, each representing a unique combination of date and customer. Your overall throughput is now distributed across more physical partitions, avoiding the RU throttling on a single point.

The "hash" kind for the partition key means Cosmos DB uses a hashing algorithm on the concatenated values of your partition key fields to determine which physical partition a document belongs to. This ensures a good distribution.

A common mistake is to think that just adding a second field to the partition key is enough. The cardinality of the first field in your composite key is crucial. If you partition by /orderDateYearMonthDay and /orderDateHour, and all your traffic happens within a single hour, you’ll still have a hot partition. The goal is to have a high number of unique values for the first element in your partition key path.

When querying, you must include all components of the partition key in your WHERE clause for optimal performance. A query like SELECT * FROM c WHERE c.customerId = 'custA' without a date filter will still hit all partitions containing custA’s data, which might be many if they’ve been active for a long time. A query like SELECT * FROM c WHERE c.orderDateYearMonthDay = '2023-10-27' AND c.customerId = 'custA' will be highly efficient, targeting a single logical partition (and thus a single physical partition).

The next challenge you’ll face is managing the RU consumption across all these new partitions, ensuring your overall provisioned throughput is adequate.

Want structured learning?

Take the full Cosmos-db course →