Cloud Functions Gen2 are actually just Cloud Run services under the hood, with a few extra bits to make them feel like Functions.
Let’s see this in action. Imagine you’ve got a Function my-function in us-central1 that’s supposed to respond to HTTP requests. You want to add a health check.
Here’s how you’d do it using gcloud:
gcloud run services update my-function \
--region=us-central1 \
--add-health-check="path=/healthz,port=8080,startup_timeout=60s,timeout=10s,interval=30s,failure_threshold=3,success_threshold=1"
The my-function service now has a health check configured. If you were to list the service’s details, you’d see something like this in the output:
# ... other service details ...
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: "100"
run.googleapis.com/health-checks: '[{"path":"/healthz","port":8080,"startup_timeout":"60s","timeout":"10s","interval":"30s","failure_threshold":3,"success_threshold":1}]'
spec:
containers:
- image: gcr.io/cloud-functions/function-framework
ports:
- containerPort: 8080
resources:
limits:
cpu: 1000m
memory: 512Mi
# ... rest of the service details ...
This tells you that the health check is configured to hit the /healthz path on port 8080 (which is the default for Cloud Run/Functions Gen2). It will try to connect for up to 10 seconds, checking every 30 seconds. It needs one successful check to be considered healthy (success_threshold=1) and will be marked unhealthy after three consecutive failures (failure_threshold=3). The startup_timeout=60s is important for new instances; they have a minute to start up and respond to the health check before being considered unhealthy.
The core problem this solves is ensuring that only healthy instances of your function are receiving traffic. Without health checks, a function instance that has crashed, is stuck in a loop, or is otherwise unresponsive might still be part of the load balancing pool, leading to requests failing for your users. Cloud Functions Gen2, being built on Cloud Run, leverages Cloud Run’s robust health checking capabilities. When a new instance starts, it needs to pass its startup health check within the startup_timeout. Once running, it continuously undergoes periodic health checks. If an instance fails these checks, Cloud Run (and thus your Function) will stop sending traffic to it and will eventually terminate and replace it.
To make this work, your function code needs to expose an endpoint that signals its readiness. The simplest way is to have a dedicated HTTP endpoint, often named /healthz or /ready, that returns a 200 OK status code immediately if the function is ready to serve requests. If there’s some initialization or a critical background process that needs to complete before the function is ready, this endpoint should not return 200 OK until that condition is met.
For example, a Python function using the Function Framework might look like this:
import functions_framework
from flask import Response
@functions_framework.http
def my_function(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): The request object.
<https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
Returns:
The response text, or any set of values that can be turned into a
Response object using `make_response`
<https://flask.palletsproduct.com/en/1.1.x/api/#flask.make_response>.
"""
# Your function logic here...
return "Hello, World!"
@functions_framework.http
def healthz(request):
"""Health check endpoint."""
# Add checks here if your function has critical dependencies
# For example, checking a database connection or an external service
# For a simple function, just returning OK is sufficient.
return Response(status=200)
You’d then configure the health check to point to this /healthz endpoint. The default port for Cloud Functions Gen2 is 8080, so you usually don’t need to specify the port if your function is listening on the default. The path parameter is crucial and must match the endpoint you’ve exposed.
The key insight is that the health check doesn’t inspect your application’s logic in terms of correctness, but rather its liveness and readiness to accept connections. A 200 OK means "I can accept traffic right now." If your function is busy processing a long-running request and cannot accept new ones, its health check endpoint should not return 200 OK. This is a common point of confusion: people sometimes think the health check should verify the result of an operation, rather than the ability to perform an operation.
After adding the health check, the next thing you’ll likely encounter is ensuring your function’s dependencies are correctly initialized before the health check passes, especially if you’re using external services.