Cosmos DB’s global distribution isn’t about replicating data; it’s about replicating consistency guarantees across regions.
Let’s see this in action. Imagine we have a Cosmos DB account with two regions: East US and West US. We’ll set up a simple ToDo application that writes to East US and reads from whichever region is closest to the client.
Here’s a snippet of how you might configure this in the Azure portal or via ARM templates:
{
"locations": [
{
"failoverPriority": 0,
"isPrimary": true,
"locationName": "East US"
},
{
"failoverPriority": 1,
"isPrimary": false,
"locationName": "West US"
}
],
"consistencyPolicy": {
"defaultConsistencyLevel": "BoundedStaleness",
"maxIntervalInSeconds": 300,
"maxStalenessPrefix": 100000
}
}
When a client application writes a ToDo item, the write operation is sent to the primary region (East US in this case). Cosmos DB then asynchronously replicates this write to all other configured regions (West US). For reads, the client can be configured to read from the closest region.
The power here is in the consistency levels. You can choose how up-to-date you need your reads to be relative to your writes.
- Strong Consistency: Every read is guaranteed to see the latest committed write. This offers the highest consistency but incurs higher latency because the write has to be acknowledged by multiple regions.
- Bounded Staleness: Reads are guaranteed to be no more than a certain number of versions (
maxStalenessPrefix) or a certain time interval (maxIntervalInSeconds) behind the writes. This is a good balance for many applications. - Session Consistency: Within a single client session, all reads and writes are consistent. Across sessions or different clients, you might see slightly older data. This is the default and offers low latency.
- Consistent Prefix: Reads will return a prefix of the write history, meaning if you see write
n, you’ll also see all writes1throughn-1. - Eventual Consistency: Reads might return stale data, but data will eventually converge across all regions. This offers the lowest latency.
The locations array defines which regions your Cosmos DB account is deployed to. failoverPriority dictates the order in which regions are promoted to primary in the event of a regional outage. isPrimary: true marks the initial write region.
The consistencyPolicy is the critical part for global distribution. BoundedStaleness with maxIntervalInSeconds: 300 and maxStalenessPrefix: 100000 means your reads will be at most 300 seconds old, or 100,000 operations behind the latest write, whichever comes first. This allows for low-latency reads in West US even though the write happened in East US, as long as the data hasn’t diverged too much.
The system automatically handles failover. If East US becomes unavailable, Cosmos DB will automatically promote West US (or the next region in the failoverPriority list) to be the primary write region. Your application, if configured correctly, will then start writing to the new primary region with minimal disruption.
What most people don’t realize is how Cosmos DB manages conflict resolution for writes when you’re not using strong consistency. If two clients write to different regions concurrently, and those writes need to be merged, Cosmos DB uses a "last writer wins" strategy based on the timestamp of the operation by default. You can, however, define custom conflict-resolution policies for specific containers to implement more sophisticated merging logic, like summing values or concatenating strings, rather than just overwriting.
The next step is understanding how to optimize read performance across these distributed regions using the SDK’s ConnectionPolicy and ReadLocation settings.