Your serverless functions are sitting idle, and when the first request comes in, there’s a noticeable delay. That’s because the underlying infrastructure needs to spin up a new environment, load your code, and initialize it. This "cold start" is the price of ephemeral compute.

Here’s a function running its first request after a period of inactivity:

{
  "FunctionName": "my-lambda-function",
  "Duration": 1523.50,
  "BillableDuration": 1524,
  "MaxMemoryUsed": 128,
  "LogResult": "START RequestId: a1b2c3d4-e5f6-7890-1234-abcdef123456 Version: $LATEST\nEND RequestId: a1b2c3d4-e5f6-7890-1234-abcdef123456\nREPORT RequestId: a1b2c3d4-e5f6-7890-1234-abcdef123456 Duration: 1523.50 ms Billed Duration: 1524 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 1200.75 ms\n",
  "Status": "Success"
}

Notice the Init Duration of 1200.75 ms. That’s the cold start. The total duration of 1523.50 ms includes this initialization time. Subsequent requests to the same warm instance will be much faster.

The core problem is that serverless platforms, by design, deprovision resources when they aren’t actively serving requests to save costs. This is great for sporadic workloads, but for applications where latency is critical, it means a built-in performance penalty for the first hit.

The solution is to keep at least one instance of your function "warm" and ready to go. This is achieved by configuring provisioned concurrency. Instead of waiting for a request to trigger an instance spin-up, you tell the platform to maintain a specified number of ready-to-serve instances at all times.

Let’s say you have a critical API endpoint that absolutely cannot tolerate a cold start. You’d configure provisioned concurrency for that specific Lambda function.

Here’s how you’d set it up using the AWS CLI:

aws lambda put-provisioned-concurrency-config \
    --function-name my-lambda-function \
    --qualifier $LATEST \
    --provisioned-concurrent-executions 5

In this command:

  • --function-name my-lambda-function: Specifies the target function.
  • --qualifier $LATEST: Applies the configuration to the latest published version of the function. You can also use a specific version number.
  • --provisioned-concurrent-executions 5: This is the key. It instructs AWS to keep 5 instances of my-lambda-function initialized and ready to process requests immediately.

Once this configuration is active, requests hitting your function will be routed to one of these pre-warmed instances. The Init Duration will effectively become zero for these requests, as the initialization has already occurred and is billed to your provisioned concurrency allocation, not the individual request.

The mechanism is straightforward: the provisioned concurrency setting tells the Lambda service to pre-allocate and keep a specified number of execution environments running. These environments are kept warm and ready for incoming requests. When a request arrives, it’s immediately dispatched to an available provisioned instance, bypassing the initialization phase. This ensures consistent, low-latency performance for your critical workloads.

You can also configure provisioned concurrency through the AWS Management Console. Navigate to your Lambda function, go to the "Configuration" tab, and then select "Provisioned concurrency." You can set the desired number of concurrent executions and specify a "Provisioned concurrency configuration" for different versions.

The cost implication is that you pay for the provisioned concurrency, regardless of whether it’s actively processing requests. This is a trade-off for guaranteed low latency. You’re essentially reserving compute capacity.

The next challenge you’ll face is managing the cost of provisioned concurrency. While it eliminates cold starts, it can become expensive if over-provisioned. You’ll need to monitor your actual concurrency needs and adjust the provisioned levels accordingly, perhaps using scheduled scaling or dynamic configuration based on traffic patterns.

Want structured learning?

Take the full Azure-functions course →