Azure Functions scale automatically based on incoming events, but you can exert significant control over this process using trigger-specific configurations.

Let’s see this in action with a common scenario: scaling an Azure Function triggered by a Service Bus queue.

{
    "version": "2.0",
    "functions": [
        {
            "scriptFile": "../run.py",
            "entryPoint": "main",
            "bindings": [
                {
                    "name": "msg",
                    "type": "serviceBusTrigger",
                    "direction": "in",
                    "queueName": "myqueue",
                    "connection": "ServiceBusConnectionAppSetting",
                    "isSessionsEnabled": false,
                    "autoDeleteOnIdle": "00:05:00",
                    "maxConcurrentCalls": 1,
                    "singleHostInstanceId": null
                },
                {
                    "name": "outputQueue",
                    "type": "serviceBus",
                    "direction": "out",
                    "queueName": "processedqueue",
                    "connection": "ServiceBusConnectionAppSetting"
                }
            ]
        }
    ]
}

This function.json defines a function that reads from myqueue, processes the message, and writes to processedqueue. The magic for scaling lies within the serviceBusTrigger binding configuration.

The primary driver of scaling for event-driven triggers like Service Bus is the event source itself. Azure Functions’ Consumption plan scales by creating more instances of your function app when there’s a backlog of events. For Service Bus, this means if messages pile up in myqueue, the platform will spin up more instances of your function app to process them concurrently. However, the rate at which individual instances process messages and the maximum concurrency are things you can directly influence.

The maxConcurrentCalls property in the function.json is crucial. When set to 1 (as in the example), each instance of your function will process only one message from the Service Bus queue at a time. If you increase this to, say, 10, a single function instance can attempt to process up to 10 messages concurrently. This can significantly boost throughput if your function’s processing is I/O-bound and can handle parallel operations efficiently without contention. Be cautious: setting this too high can lead to resource exhaustion on the instance or even overwhelm downstream services.

autoDeleteOnIdle is another interesting setting, though less directly about scaling up and more about resource cleanup. It defines how long an entity (like a queue or topic subscription) can be inactive before it’s automatically deleted. For scaling, it means that if your queue becomes idle for 5 minutes, it will be cleaned up. This isn’t a direct scaling control but affects the lifecycle of the entities your function interacts with.

The isSessionsEnabled property is vital if your Service Bus queue uses sessions. If true, your function will only process one session at a time per instance, ensuring ordered processing within a session. If false (as in the example), messages are processed without regard to session affinity. This has a direct impact on how many messages can be processed concurrently, as session-aware processing inherently limits parallelism per session.

Beyond the trigger configuration, the underlying host configuration plays a role. In host.json, you can set maxConcurrentRequests for the function host itself. This limits the total number of concurrent requests that can be processed by all triggers within a single function app instance. While trigger-specific settings like maxConcurrentCalls offer finer-grained control for a particular trigger, maxConcurrentRequests acts as a global cap. For example, if maxConcurrentCalls for your Service Bus trigger is 10, but maxConcurrentRequests in host.json is 5, then that function app instance will never process more than 5 requests across all its triggers simultaneously.

Consider the type of trigger. While Service Bus, Event Hubs, and Queue Storage triggers all aim to scale based on backlog, their internal scaling mechanisms and properties differ. Event Hubs, for instance, uses consumer groups and partitions. Scaling often involves managing how many instances are assigned to read from different partitions within a consumer group. You can influence this by the number of instances your function app has and how the Event Hubs SDK distributes the load across them. For Blob Storage triggers, scaling is more about the rate at which new blobs are detected and processed, with less direct configuration for concurrency at the trigger level compared to Service Bus.

The most overlooked aspect is often the downstream dependencies. Even if you configure your function to process 100 messages concurrently, if your database can only handle 10 writes per second, your scaling will be bottlenecked there. Azure Functions’ scaling is reactive; it adds instances when there’s work. If those instances are blocked by an external dependency, the platform might continue to scale up, leading to a large number of idle or waiting instances, which is inefficient. Monitoring your function’s execution time and identifying bottlenecks in your code or external services is as important as configuring the trigger.

Finally, remember that even with all these configurations, the Consumption plan’s underlying infrastructure has limits. While it scales automatically, there’s a maximum number of instances a function app can scale to, and cold starts can introduce latency when new instances are provisioned. For predictable, high-throughput scenarios, considering Premium or Dedicated plans might be necessary, where you have more explicit control over instance count and pre-warmed instances.

The next thing you’ll likely grapple with is how to manage the ordering of messages when scaling out, especially with distributed systems.

Want structured learning?

Take the full Azure-functions course →