Build High-Throughput Cosmos DB Apps with the Java Async SDK (2026)

Cosmos DB’s Java async SDK can actually reduce your application’s latency and resource consumption by letting it do more work concurrently rather than waiting.

Let’s say we’re building a simple catalog service. It needs to read item details from Cosmos DB. Here’s a synchronous approach, which is what most people start with:

// Synchronous example
Item item = container.getItem(itemId, partitionKey).readItem(new Object());
// Do something with the item...

This looks straightforward. When readItem is called, the thread making the call blocks. It sits there, twiddling its thumbs, until Cosmos DB sends back the item. If you have many concurrent requests, you’ll quickly run out of threads, leading to thread starvation and a sluggish application.

Now, let’s look at the async SDK. The core idea is to not block. Instead, you get back a CompletableFuture (or a similar reactive type like Mono or Flux if you’re using Spring Data Cosmos). This CompletableFuture represents the promise that the result will be available later. While the SDK is waiting for Cosmos DB, the thread is free to go do other work, like processing another request or performing a different database operation.

Here’s how that same read operation looks asynchronously:

// Asynchronous example
CosmosAsyncContainer asyncContainer = asyncDatabase.getContainer(containerName);
asyncContainer.getItem(itemId, partitionKey).readItem(new Object())
    .flatMap(response -> {
        Item item = response.getItem();
        // Process the item...
        return Mono.just(item); // Or whatever your downstream operation returns
    })
    .subscribe(processedItem -> {
        // Handle the final result
        System.out.println("Item processed: " + processedItem.getId());
    }, error -> {
        // Handle errors
        System.err.println("Error processing item: " + error.getMessage());
    });

See how readItem immediately returns a Mono? The work of fetching the item happens in the background. The .flatMap and .subscribe parts define what happens when the item is eventually fetched and processed. Your thread isn’t tied up waiting.

This is crucial for high-throughput applications. Imagine your app needs to fetch 10 items for a dashboard.

Synchronous: 10 separate threads, each blocking for, say, 50ms. Total time: ~50ms (if you have enough threads). But you’re using 10 threads.
Asynchronous: 1 thread initiates all 10 fetches. It then waits for any of them to complete. As soon as one finishes, the thread picks up its result, processes it, and then goes back to waiting for the next completion. The total time might still be around 50ms (network latency is the bottleneck), but you’re only using a fraction of the threads. This frees up threads for other critical tasks.

The CosmosAsyncClient is your entry point. You configure it with connection policies, retry options, and endpoint details.

CosmosClientBuilder clientBuilder = new CosmosClientBuilder()
    .endpoint("<your-cosmos-db-endpoint>")
    .key("<your-cosmos-db-primary-key>")
    .preferredRegions(Arrays.asList("West US", "East US"))
    .consistencyLevel(ConsistencyLevel.SESSION); // Or your desired level

CosmosAsyncClient asyncClient = clientBuilder.buildAsyncClient();

The preferredRegions setting is important for latency. Cosmos DB will try to serve requests from the closest region listed. If that region is unavailable, it will fall back to the next one.

When you create an asyncContainer object, you’re not establishing a persistent connection for that specific operation. You’re getting a handle to interact with a container. The underlying CosmosAsyncClient manages a pool of connections and uses them efficiently for all your asynchronous operations. This connection pooling is a major performance booster.

One key aspect of the async SDK is understanding how to chain operations. If you need to read an item, then update it, then read another related item, you’d chain these Mono or Flux objects together.

asyncContainer.getItem("user-123", "user-123").readItem()
    .flatMap(userResponse -> {
        User user = userResponse.getItem();
        user.setLastLogin(Instant.now());
        return asyncContainer.upsertItem(user); // Returns a Mono<ItemResponse<User>>
    })
    .flatMap(upsertResponse -> {
        System.out.println("User updated.");
        // Now fetch related data
        return asyncContainer.getItem("order-abc", "user-123").readItem();
    })
    .subscribe(orderResponse -> {
        Order order = orderResponse.getItem();
        System.out.println("Related order fetched: " + order.getId());
    }, error -> {
        System.err.println("An error occurred in the chain: " + error.getMessage());
    });

This chain executes sequentially: the upsertItem starts only after readItem completes, and the second readItem starts only after upsertItem completes. However, while any of these operations are waiting for Cosmos DB, the thread is released. This is where the true concurrency gain comes from.

A common pitfall is using .block() or .blockLast() on CompletableFuture or reactive types. This defeats the purpose of async programming and brings back the blocking behavior, potentially leading to thread starvation. Avoid these in your application logic unless absolutely necessary for a very specific, isolated synchronous integration point, and even then, do so with extreme caution.

The SDK handles retries automatically based on the CosmosClientBuilder configuration. If a request fails due to a transient network issue or a 429 Too Many Requests (rate limiting), the SDK will attempt to retry the operation. You can configure the maximum number of retries and the back-off strategy. This resilience is built-in and works seamlessly with the async model.

When you’re dealing with very high-volume reads and writes, understanding the Request Units (RUs) consumed by each operation becomes paramount. The async SDK, by freeing up threads, allows your application to issue more requests within a given time frame, effectively utilizing your provisioned throughput. You’ll want to monitor your RU consumption in the Azure portal and adjust your provisioned throughput accordingly. The SDK’s logging can also provide insights into RU usage per request.

The CosmosAsyncClient uses Netty internally for its HTTP client. This is a highly performant, non-blocking I/O framework. The SDK abstracts away much of the complexity, but understanding that it’s built on a foundation of non-blocking I/O helps explain why it’s so efficient.

The next challenge you’ll face is managing optimistic concurrency control with ETags for reliable updates.