Azure Functions can process Event Hubs streams, but they don’t just "listen" for messages; they actively poll Event Hubs partitions for new data, making them more like a smart, event-driven micro-batcher than a true push-based stream processor.
Let’s see it in action. Imagine you have an Event Hub named my-event-hub in a namespace my-eventhub-ns. Your Azure Function is triggered by this hub.
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "event",
"type": "eventHubTrigger",
"direction": "in",
"eventHubName": "my-event-hub",
"consumerGroup": "$Default",
"connection": "EventHubConnectionAppSetting"
}
]
}
And your Python function might look like this:
import logging
import json
def main(event: list) -> None:
logging.info(f"Processing {len(event)} events.")
for single_event in event:
try:
message_body = single_event.get_body().decode('utf-8')
event_data = json.loads(message_body)
logging.info(f"Received event: {event_data}")
# Your processing logic here
except Exception as e:
logging.error(f"Error processing event: {e}")
# Consider sending to a dead-letter queue or retrying
When this function runs, the event parameter isn’t a single message, but a list of messages. The Azure Functions runtime, using the Event Hubs SDK, manages consumer group offsets. It fetches a batch of messages from a partition, passes them to your function, and then updates the offset for that partition and consumer group, marking those messages as processed. If your function fails, the offsets aren’t updated, and the next trigger invocation will receive the same batch of messages, allowing for retries.
The core problem this solves is enabling serverless, event-driven processing of high-throughput data streams. Instead of managing dedicated stream processing infrastructure, you can deploy code that scales automatically based on the incoming event volume. The "trigger" abstracts away the complexities of connecting to Event Hubs, managing consumer groups, and handling partition ownership.
Internally, the Event Hubs trigger binding uses the azure-eventhub Python SDK. When the function host starts, it establishes connections to Event Hubs for each configured trigger. It then iterates through the partitions of the specified Event Hub for the given consumer group. For each partition, it checks the last recorded offset. It then calls receive_batch on the Event Hubs client, requesting a certain number of events or a maximum time to wait. Once a batch is received (or a timeout occurs), the batch is passed to your function. After your function completes successfully, the host updates the offset for that partition in the Event Hubs metadata store (which is managed by Event Hubs itself).
The key levers you control are:
batchSize(inhost.json): This setting directly influences how many messages your function tries to receive in a single invocation. A largerbatchSizecan improve throughput by reducing the overhead of function invocations but might increase processing latency for individual messages if the batch is large. The default is 100.maxBatchSize(inhost.json): This is a soft limit. The actual batch size might be smaller.maximumWaitTime(inhost.json): This dictates how long the trigger binding will wait for messages before invoking the function, even ifbatchSizeisn’t reached. This is crucial for reducing latency with low-volume streams, ensuring messages aren’t stuck waiting for a full batch. The default is 1 minute.consumerGroup: Using a dedicated consumer group for your function ensures that its processing is isolated from other applications or functions reading from the same Event Hub. This is essential for reliable, independent processing.EventHubConnectionAppSetting: This app setting in your Function App’s configuration holds the connection string for your Event Hub.
A common misconception is that the event parameter is always a single message. In reality, for Event Hubs triggers, it’s always a list, and the runtime handles batching. You must iterate through this list to process each individual event. Furthermore, the success or failure of your function determines whether the offsets are committed. If your function throws an unhandled exception, the batch is not considered processed, and the same batch will be redelivered on the next invocation. This is the built-in retry mechanism.
The next concept you’ll likely encounter is managing state across function invocations, especially when dealing with ordered processing or aggregations over time, which often leads to exploring durable functions or external state stores.