Cloud Functions Gen2 lets you control how many requests your function can handle simultaneously, but most people think concurrency is about maximizing throughput by throwing more workers at the problem.

Let’s see it in action. Imagine a function that processes image uploads.

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/GoogleCloudPlatform/functions-framework-go/functions"
)

func init() {
	functions.CloudEvent("ProcessImage", ProcessImage)
}

// ProcessImage is triggered by a Cloud Storage event.
func ProcessImage(ctx context.Context, e CloudStorageEvent) error {
	log.Printf("Processing file: %s", e.Name)

	// Simulate image processing time
	time.Sleep(5 * time.Second)

	log.Printf("Finished processing file: %s", e.Name)
	return nil
}

type CloudStorageEvent struct {
	Name        string `json:"name"`
	Bucket      string `json:"bucket"`
	Metageneration string `json:"metageneration"`
	TimeCreated string `json:"timeCreated"`
	Updated     string `json:"updated"`
}

If we deploy this function with a default concurrency of 80, and 100 image uploads arrive at roughly the same time, 80 will start processing immediately. The remaining 20 will be queued and will start once one of the initial 80 finishes. This is the basic idea: multiple instances of your function code running in parallel to handle incoming requests.

The core problem Cloud Functions Gen2 concurrency solves is managing resource utilization and cost while meeting performance demands. If you have a function that takes a long time to process each request, like our image processor, you need enough concurrency to avoid a massive backlog. But if you set concurrency too high, you might spin up way more instances than you need, leading to unnecessary costs. Each concurrent request consumes CPU, memory, and potentially network bandwidth within a single instance.

When you deploy a Cloud Function Gen2, you can specify the maximum number of concurrent requests an instance can handle using the --concurrency flag.

gcloud functions deploy process-image \
  --gen2 \
  --runtime go116 \
  --region us-central1 \
  --source . \
  --entry-point ProcessImage \
  --trigger-bucket my-image-bucket \
  --max-instances 10 \
  --concurrency 10

In this example, --concurrency 10 means that a single instance of process-image can handle up to 10 requests simultaneously. If 100 requests arrive, and we have --max-instances 10 set, Cloud Functions will scale up to 10 instances. Each of those 10 instances will then try to handle up to 10 requests concurrently. This means, in theory, the function could handle up to 100 requests in flight at any given moment (10 instances * 10 concurrency per instance).

The actual number of instances that spin up is determined by --max-instances and the incoming traffic rate. If traffic exceeds the capacity of currently running instances, new instances are created up to the --max-instances limit. If a single instance’s concurrency limit is reached, and more requests come in, Cloud Functions will attempt to scale out by creating more instances (up to --max-instances) before rejecting requests.

The key insight often missed is that concurrency is per-instance. When you set --concurrency to N, you’re telling Cloud Functions that each instance it creates is capable of handling N requests in parallel. This is different from setting a global limit on concurrent requests across all instances. If your function is highly I/O bound (waiting on external services, databases, or storage), you can often set a higher concurrency value without significantly impacting performance because the CPU isn’t saturated. However, for CPU-bound tasks, setting concurrency higher than the number of available CPU cores per instance can lead to diminishing returns or even performance degradation due to context switching overhead.

Understanding the relationship between concurrency, memory, and CPU is crucial. A higher concurrency setting means each instance might need more memory to hold the state for multiple in-flight requests and more CPU to manage them. You might need to increase the --memory allocation for your function if you increase concurrency significantly.

The next step after optimizing concurrency is to consider cold starts and how they impact your perceived latency, especially with infrequent but bursty traffic.

Want structured learning?

Take the full Cloud-functions course →