Azure Functions Premium plan scales out by adding more instances of your function app.
Let’s look at how this actually happens with a real-world example. Imagine you have a function that processes image uploads. When traffic spikes, say during a holiday sale, the Premium plan automatically spins up more compute instances to handle the increased load.
{
"functionApp": {
"name": "image-processor-app",
"planType": "Premium",
"region": "eastus",
"runtime": "node",
"version": "16",
"scaling": {
"maxInstances": 20,
"minInstances": 2,
"defaultScaleRule": {
"metricName": "QueueLength",
"direction": "Increase",
"scaleAction": {
"cooldown": "00:05:00",
"minChange": 1,
"maxChange": 10
},
"target": 10
}
},
"triggers": [
{
"type": "BlobTrigger",
"path": "uploads/{name}",
"storageAccount": "mystorageaccount"
}
],
"configuration": {
"appSettings": {
"WEBSITE_NODE_DEFAULT_VERSION": "16",
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=mystorageaccount;..."
}
}
}
}
This JSON shows a Premium plan function app, image-processor-app, configured to scale. The minInstances is set to 2, meaning at least two instances will always be running, ready to handle requests. The maxInstances is 20, capping the number of instances to control costs and resource usage.
The defaultScaleRule dictates when scaling occurs. Here, it monitors the QueueLength of the storage account’s blob trigger. When the queue length exceeds a target of 10 messages, the direction is set to Increase, triggering a scale-out action. The scaleAction specifies that at least 1 instance should be added (minChange: 1), up to a maximum of 10 instances per scaling event (maxChange: 10), after a 5-minute cooldown period.
So, if your image upload queue suddenly has 15 messages, the system notices this. After 5 minutes of the queue length staying above 10, it will add 1 instance. If it climbs to 25 messages, it might add another instance. This happens automatically.
The key benefit here is that the Premium plan provides pre-warmed instances. Unlike the Consumption plan where new instances need to start up (which takes time), Premium instances are kept running and ready. This means your function can respond to events with near-zero cold-start latency, even when scaling up. The runtime environment is already initialized, and your function code is loaded.
The actual scaling is managed by Azure’s internal scaling engine, which observes these metrics and adjusts the instance count based on your configuration. You don’t manually provision or deprovision servers. The plan automatically handles the allocation and deallocation of compute resources.
The QueueLength metric is just one example. You can also configure scaling rules based on other metrics like HTTP request rate, CPU utilization, or custom metrics emitted by your application. The minInstances setting is crucial for ensuring high availability and immediate responsiveness for critical functions, preventing those initial cold starts during periods of low but consistent traffic.
The cooldown period is a safeguard. It prevents rapid scaling up and down if a metric briefly crosses the threshold and then returns. For example, if the queue length spikes to 11 and then drops back to 5 within the 5-minute cooldown, no scaling action will be taken. This helps maintain stability and avoid unnecessary resource fluctuations.
The maxChange parameter in scaleAction is important for controlling the rate of scaling. If your queue length jumps from 10 to 1000 instantly, the system won’t try to add 1000 instances at once. It will add up to maxChange instances per scaling event, and then re-evaluate based on the current metric values after the cooldown. This prevents overwhelming the system or incurring massive, unexpected costs.
One subtle point often missed is that the Premium plan’s scaling is primarily driven by event-driven triggers. While it can react to HTTP requests, its most powerful scaling mechanisms are tied to the backlog of work arriving through services like Azure Storage Queues, Service Bus, or Event Hubs. The system is designed to ensure that messages are processed promptly by having enough instances ready to consume them.
When you’re looking at your function app’s performance in the Azure portal, you’ll see the "Scale out" and "Scale in" events logged. These are direct results of the scaling rules you’ve configured. You can also monitor the "Instance count" metric to see how your app is behaving under load.
The next thing you’ll likely encounter is optimizing the cost of scaling on the Premium plan, specifically by understanding the difference between provisioned instances and actual usage, and how to configure the minInstances setting to balance readiness with expenditure.