Run Cosmos DB in Serverless Mode for Unpredictable Workloads (2026)

The real magic of Cosmos DB serverless isn’t just that it scales down to zero; it’s that it actively punishes predictable, high-throughput workloads by being more expensive than provisioned throughput.

Let’s see it in action. Imagine you have a small, intermittent API that needs to store user preferences. It might get a burst of 50 requests per second for 10 minutes, then nothing for an hour. This is exactly the kind of workload serverless shines on.

{
  "name": "serverless-example-db",
  "location": "West US 2",
  "databaseAccounts": [
    {
      "name": "serverless-example-account",
      "kind": "GlobalDocumentDB",
      "locations": [
        {
          "locationName": "West US 2",
          "failoverPriority": 0,
          "isZoneRedundant": false
        }
      ],
      "properties": {
        "consistencyPolicy": {
          "defaultConsistencyLevel": "Session"
        },
        "capabilities": [
          {
            "name": "EnableServerless"
          }
        ],
        "analyticalStorageConfiguration": {
          "schemaType": "WellDefined"
        }
      }
    }
  ]
}

This Azure ARM template snippet shows the key: "name": "EnableServerless". That’s it. No RUs to provision, no scaling settings to tune. You create a database and a container, and Cosmos DB handles the rest. When a request comes in, Cosmos DB allocates resources on demand. If no requests come in, it scales down to zero, and you pay nothing for throughput.

The problem serverless solves is the "over-provisioning tax." With provisioned throughput (the default), you must set a Request Unit (RU) capacity for your database or container. If your workload is spiky, you have to provision for the peak to avoid throttling, meaning you pay for that peak capacity even when it’s idle. If you provision too low, your application gets slow or fails. Serverless eliminates this guesswork and the associated cost. It adapts automatically.

Internally, Cosmos DB serverless uses a pool of shared compute resources. When your workload spikes, your requests are routed to available resources. When it drops, your requests are no longer consuming those resources, and the cost associated with them stops. This is managed through a sophisticated internal scheduler and resource allocator that’s invisible to you. You don’t see RUs; you see actual operations (reads, writes, queries) being processed.

The primary lever you control is simply using the database. The system handles the scaling. However, understanding your workload’s pattern is crucial for cost optimization. If your workload is consistently high, say 1000 RUs for 24 hours a day, provisioned throughput will almost certainly be cheaper. Serverless is priced per operation (e.g., per 1000 reads, per 1000 writes) plus storage. At sustained high volumes, the per-operation cost in serverless adds up to more than the cost of pre-purchased RUs.

A common misconception is that "serverless" means "free when idle." While the throughput cost goes to zero, you still pay for data storage. Also, the per-operation cost, while appearing low, can accumulate rapidly. A simple SELECT * FROM c query that scans a large container can consume hundreds or thousands of RUs worth of operations, costing more than you might expect if you’re not mindful of query efficiency.

The next hurdle for serverless users is understanding the actual cost drivers for their specific operations, moving from an RU-centric view to an operation-count view.