Cosmos DB’s global distribution is a lie if your application can’t handle replica failures gracefully.

Let’s see it in action. Imagine a Spring Boot app that needs to store and retrieve user profiles from Cosmos DB.

import com.azure.cosmos.CosmosClient;
import com.azure.cosmos.CosmosClientBuilder;
import com.azure.cosmos.CosmosDatabase;
import com.azure.cosmos.CosmosContainer;
import com.azure.cosmos.models.CosmosItemResponse;
import com.azure.cosmos.models.PartitionKey;
import org.springframework.stereotype.Service;

import java.util.Optional;

@Service
public class UserProfileService {

    private final CosmosContainer userProfileContainer;

    public UserProfileService() {
        // Replace with your actual Cosmos DB endpoint and key
        String endpoint = "https://your-cosmosdb-account.documents.azure.com:443/";
        String key = "YOUR_PRIMARY_KEY";
        String databaseName = "UserProfileDB";
        String containerName = "Users";

        CosmosClient client = new CosmosClientBuilder()
            .endpoint(endpoint)
            .key(key)
            .buildClient();

        CosmosDatabase database = client.getDatabase(databaseName);
        this.userProfileContainer = database.getContainer(containerName);
    }

    public void saveUserProfile(UserProfile user) {
        // Assuming UserProfile has an 'id' and a 'partitionKey' field
        CosmosItemResponse<UserProfile> response = userProfileContainer.upsertItem(user);
        System.out.println("Saved user: " + user.getId() + " with status code: " + response.getStatusCode());
    }

    public Optional<UserProfile> getUserProfile(String userId) {
        try {
            // Assuming 'id' is the partition key for this example
            // In a real-world scenario, you'd use the actual partition key value
            CosmosItemResponse<UserProfile> response = userProfileContainer.readItem(userId, new PartitionKey(userId), UserProfile.class);
            return Optional.ofNullable(response.getItem());
        } catch (Exception e) {
            // Handle cases where the item is not found or other errors
            System.err.println("Error fetching user " + userId + ": " + e.getMessage());
            return Optional.empty();
        }
    }

    // Define a simple UserProfile class
    public static class UserProfile {
        private String id;
        private String name;
        private String partitionKey; // Important for Cosmos DB

        // Constructors, getters, and setters
        public UserProfile() {}

        public UserProfile(String id, String name, String partitionKey) {
            this.id = id;
            this.name = name;
            this.partitionKey = partitionKey;
        }

        public String getId() { return id; }
        public void setId(String id) { this.id = id; }
        public String getName() { return name; }
        public void setName(String name) { this.name = name; }
        public String getPartitionKey() { return partitionKey; }
        public void setPartitionKey(String partitionKey) { this.partitionKey = partitionKey; }
    }
}

This code establishes a connection to Cosmos DB and provides methods to saveUserProfile and getUserProfile. The saveUserProfile uses upsertItem for both creating and updating, and getUserProfile uses readItem. The UserProfile class includes id and partitionKey, which are crucial for Cosmos DB operations.

Cosmos DB is a globally distributed, multi-model database service. It offers elastic and independent scaling of storage and throughput across any number of geographic regions. The core problem it solves is providing a highly available, low-latency data store that can scale on demand, anywhere in the world. It achieves this by replicating data across multiple regions and allowing applications to connect to the nearest replica. The "multi-model" aspect means it supports various data models (document, key-value, graph, column-family) through APIs like SQL (Core), MongoDB, Cassandra, Gremlin, and Table.

Internally, Cosmos DB uses a distributed log-structured merge-tree (LSM-tree) storage engine. Writes are first appended to a distributed log, ensuring durability and atomicity. Reads can be served from local replicas. To manage its distributed nature, it employs a sophisticated consensus protocol (like Paxos or Raft, though Azure doesn’t detail its exact implementation for user-facing APIs) to ensure consistency across replicas. When you interact with Cosmos DB via its SDKs, you’re communicating with the closest available replica. The SDK then handles routing requests, managing retries, and potentially direct communication with other regions if necessary for certain operations.

The key levers you control are:

  1. Throughput (RU/s): Request Units per second. This is the fundamental currency for performance. You provision RU/s for your database or container, and all operations consume RUs. You can set this manually or enable autoscale.
  2. Consistency Levels: Cosmos DB offers five consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual. Each provides a different trade-off between consistency, availability, and latency. Session is the default and often a good balance.
  3. Partitioning Strategy: How you design your partition keys is paramount. A good partition key distributes requests evenly across physical partitions, preventing hot spots and maximizing RU utilization. The partition key value is part of the request itself.
  4. Indexing Policy: Cosmos DB automatically indexes data, but you can customize the indexing policy to include or exclude paths, change indexing modes (consistent vs. lazy), and set composite indexes to optimize query performance.
  5. Global Distribution: You configure which regions your Cosmos DB account is deployed to. The SDK can be configured for different write and read regions.

The most surprising aspect of Cosmos DB’s global distribution is that even with all regions configured, a single application instance connecting to a specific region will still experience latency and potential unavailability if that specific region’s replica becomes unhealthy or unreachable, regardless of other healthy regions. The SDK’s default behavior is to try and fail over, but the initial connection and subsequent operations are highly dependent on the health of the endpoint it’s currently talking to. You aren’t magically querying the "global" database; you’re querying a specific replica, and the system manages the complexity of distributed operations from there.

If you are using the SQL API and notice intermittent "429 Too Many Requests" errors, it’s not always about hitting your total RU/s. It can also be a symptom of the SDK attempting to reconcile its internal state with the distributed state of Cosmos DB, especially under high load or during transient network issues between replicas. The SDK might be waiting for a quorum of replicas to acknowledge an operation before it can proceed, and if that quorum is difficult to achieve due to network partitioning or replica unavailability, you’ll see throttling, even if your provisioned RU/s are not fully consumed. The SDK has internal retry policies, but understanding the underlying distributed nature helps in debugging these "phantom" throttles.

The next concept you’ll grapple with is designing your application to leverage Cosmos DB’s multi-region write capabilities effectively.

Want structured learning?

Take the full Cosmos-db course →