Azure Functions can retry operations for you, but it’s not magic; it’s a configurable, deterministic process that often trips people up because the default settings aren’t always what you want.
Let’s see what this looks like in practice. Imagine a function that calls an external API. If that API is temporarily unavailable, we don’t want our function to just fail immediately. We want it to try again.
Here’s a simple function that calls an external service:
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
public static class RetryExample
{
[FunctionName("RetryExample")]
public static async Task Run([TimerTrigger("0 * * * * *")]TimerInfo myTimer, ILogger log)
{
log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");
try
{
using (var client = new HttpClient())
{
var response = await client.GetAsync("https://this-api-might-fail.azurewebsites.net/api/status");
response.EnsureSuccessStatusCode(); // This will throw if the status code is not 2xx
log.LogInformation($"API call successful: {response.StatusCode}");
}
}
catch (HttpRequestException ex)
{
log.LogError($"API call failed: {ex.Message}");
// We want this function to retry if the API is down.
throw; // Re-throwing the exception is key for the retry mechanism to kick in.
}
}
}
The throw; at the end is crucial. If the function completes without an exception, the Functions runtime assumes it succeeded. If an exception is thrown, the runtime might retry, depending on how we’ve configured it.
The retry behavior is controlled by the function.json file for durable functions, or more commonly, through attributes in code for other trigger types. For a timer-triggered function like this, we can use the Microsoft.Azure.WebJobs.Extensions.Timers package and its retry attributes.
Here’s how we’d configure retries directly on the TimerTrigger attribute:
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
public static class RetryExample
{
[FunctionName("RetryExample")]
[ExponentialBackoffRetry(
maxRetryCount: 5, // How many times to retry after the initial failure.
minInterval: "00:00:05", // Minimum delay between retries (5 seconds).
maxInterval: "00:01:00", // Maximum delay between retries (1 minute).
// Default is "00:00:01" for first retry, then doubles.
// This specific configuration will retry 5 times, with delays of 5s, 10s, 20s, 40s, 1m.
// The actual delay is capped by maxInterval.
// If the initial execution is at T=0, retries will be at T=5s, T=15s, T=35s, T=1m15s, T=2m15s.
// The next scheduled execution is at T=1m. If retry happens after T=1m, the next scheduled execution is skipped.
// The delay is calculated based on the *previous failure time*, not the scheduled execution time.
// The `maxRetryCount` is the number of *additional* attempts after the first failure.
// So, total executions would be 1 (initial) + 5 (retries) = 6 if all fail.
// If you set maxRetryCount to -1, it will retry indefinitely. Be careful with this.
// The `RetryStrategy` property can also be set to `LinearRetry` or `FixedIntervalRetry`.
// `ExponentialBackoffRetry` is the most common as it avoids overwhelming a struggling service.
)]
public static async Task Run([TimerTrigger("0 * * * * *")]TimerInfo myTimer, ILogger log)
{
log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");
try
{
using (var client = new HttpClient())
{
// Simulate a transient failure by sometimes returning a 503 Service Unavailable
var random = new Random();
if (random.Next(0, 3) == 0) // ~33% chance of failure
{
log.LogWarning("Simulating transient API failure.");
// In a real scenario, this would be the actual API call that fails.
// We're not actually making a call here to avoid relying on an external service.
throw new HttpRequestException("Simulated 503 Service Unavailable");
}
var response = await client.GetAsync("https://example.com/api/status"); // Replace with a real URL if testing
response.EnsureSuccessStatusCode();
log.LogInformation($"API call successful: {response.StatusCode}");
}
}
catch (HttpRequestException ex)
{
log.LogError($"API call failed: {ex.Message}");
throw; // Re-throwing is essential for the retry mechanism.
}
}
}
The ExponentialBackoffRetry attribute is your friend here. maxRetryCount tells it how many times to try again after the first failure. minInterval and maxInterval define the bounds for how long it will wait between these retries. The "exponential" part means it starts with a short delay (which defaults to 1 second if not specified, or the minInterval if it’s larger) and doubles it for each subsequent retry, up to the maxInterval.
This strategy is generally preferred because it avoids hammering a service that might be overloaded. If the API is having temporary issues, a short, rapidly increasing delay is less likely to exacerbate the problem than a constant, aggressive retry.
It’s important to understand that these retries happen within the same function execution context. The timer trigger will fire, the function will run, and if it fails, the retry mechanism within that single timer invocation will kick in. If all retries fail, the function invocation is considered failed. The next scheduled timer trigger will then fire independently. If you’re using a timer that fires every minute, and a function execution with retries takes longer than a minute, the next scheduled invocation might be skipped or delayed depending on your runtime configuration.
The true power of retries becomes apparent with Durable Functions. For orchestrations, you can use CallActivityWithRetryAsync to wrap activity functions. This gives you even finer-grained control over retry policies, including custom retry logic and the ability to retry specific types of exceptions. The RetryOptions object allows you to specify MaxNumberOfAttempts, BackoffCoefficient, FirstRetryInterval, and MaxRetryInterval.
A common pitfall is not re-throwing the exception. If you catch an HttpRequestException and log it but don’t throw;, the Azure Functions runtime sees a successfully completed function and won’t initiate any retries. Always re-throw if you want the retry policy to engage.
The next thing you’ll likely grapple with is managing state across retries, especially in Durable Functions, where you might want to update external systems differently on each retry or track progress.