Cloud Run services don’t actually start receiving traffic until they’ve signaled they’re ready to serve it.

Let’s see it in action. Imagine we have a simple Python Flask app:

from flask import Flask
import time
import os

app = Flask(__name__)

# Simulate a long startup time
startup_delay = int(os.environ.get("STARTUP_DELAY", "10"))
print(f"Simulating startup delay of {startup_delay} seconds...")
for i in range(startup_delay):
    time.sleep(1)
    print(f"  ...{i+1}s")
print("Startup complete.")

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

When you deploy this to Cloud Run, it won’t immediately serve requests. Cloud Run has a built-in mechanism to wait for your application to confirm it’s ready. This is where liveness and readiness probes come in, though Cloud Run’s default behavior is often sufficient for simple apps.

The Problem Cloud Run Solves:

Cloud Run is designed for stateless, ephemeral containers. When a new instance of your container starts, it needs time to:

  1. Initialize: Load code, establish database connections, fetch configuration.
  2. Become Responsive: Start its web server and be able to handle incoming HTTP requests.

Without a way for Cloud Run to know when your application is actually ready, it might send traffic to an instance that’s still booting up, leading to 5xx errors and a poor user experience.

How Cloud Run’s Default Readiness Works:

By default, Cloud Run considers a container instance "ready" to receive traffic as soon as its main application process (the one you specified in your CMD or ENTRYPOINT) exits with a status code of 0 (success). For most web servers like Flask, Gunicorn, or Node.js Express, this means the server process is running.

However, this default is often too simplistic. Your web server might be running, but your application logic might still be initializing. This is where explicit probes become crucial.

Liveness vs. Readiness:

  • Readiness Probes: These tell Cloud Run when an instance is ready to accept traffic. If a readiness probe fails, Cloud Run stops sending new requests to that instance and will eventually restart it if it consistently fails. This is your primary tool for managing startup and preventing traffic to unhealthy instances.
  • Liveness Probes: These tell Cloud Run if an instance is still alive and functioning correctly. If a liveness probe fails repeatedly, Cloud Run will restart the instance. This is for detecting deadlocks, crashes, or situations where the application is running but unresponsive to its own health checks.

Configuring Probes in Cloud Run:

You configure these probes when deploying or updating your Cloud Run service. You can do this via the Google Cloud Console or the gcloud CLI.

Let’s configure explicit probes for our Flask app. We’ll add a /healthz endpoint to our application:

from flask import Flask, jsonify
import time
import os

app = Flask(__name__)

# Simulate a long startup time
startup_delay = int(os.environ.get("STARTUP_DELAY", "10"))
print(f"Simulating startup delay of {startup_delay} seconds...")
# In a real app, this would be actual initialization work
for i in range(startup_delay):
    time.sleep(1)
    print(f"  ...{i+1}s")
print("Startup complete.")

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/healthz')
def healthz():
    # In a real app, check DB connections, external services, etc.
    # For this example, we assume if we reach here, we're healthy.
    return jsonify({"status": "healthy"}), 200

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

Now, let’s deploy this with gcloud.

Readiness Probe Configuration:

We want Cloud Run to wait until /healthz returns a 200 OK before sending traffic.

gcloud run deploy YOUR_SERVICE_NAME \
  --image gcr.io/YOUR_PROJECT/YOUR_IMAGE_NAME:latest \
  --platform managed \
  --region us-central1 \
  --port 8080 \
  --min-instances 0 \
  --max-instances 10 \
  --no-allow-unauthenticated \
  --update-env-vars STARTUP_DELAY=20 \
  --add-cloudsql-instances YOUR_CLOUDSQL_CONNECTION \
  --container-readiness-probe-path "/healthz" \
  --container-readiness-probe-port 8080 \
  --container-readiness-probe-initial-delay 10 \
  --container-readiness-probe-period 5 \
  --container-readiness-probe-timeout 2
  • --container-readiness-probe-path "/healthz": Specifies the HTTP path to probe.
  • --container-readiness-probe-port 8080: The port your application is listening on.
  • --container-readiness-probe-initial-delay 10: Wait 10 seconds after the container starts before the first probe. This is critical for applications with startup times. Our STARTUP_DELAY is 20, so 10 is a reasonable start.
  • --container-readiness-probe-period 5: Probe every 5 seconds.
  • --container-readiness-probe-timeout 2: Wait a maximum of 2 seconds for a response.

With these settings, Cloud Run will start the container, wait 10 seconds (the initial-delay), and then start hitting /healthz every 5 seconds. If /healthz returns 200 OK, the instance is marked ready. If it fails, it keeps trying. If it fails consistently for a certain duration (based on period and timeout), the instance might be restarted.

Liveness Probe Configuration:

Liveness probes are generally configured similarly but are used to detect ongoing issues.

gcloud run deploy YOUR_SERVICE_NAME \
  --image gcr.io/YOUR_PROJECT/YOUR_IMAGE_NAME:latest \
  --platform managed \
  --region us-central1 \
  --port 8080 \
  --min-instances 0 \
  --max-instances 10 \
  --no-allow-unauthenticated \
  --update-env-vars STARTUP_DELAY=20 \
  --add-cloudsql-instances YOUR_CLOUDSQL_CONNECTION \
  --container-liveness-probe-path "/healthz" \
  --container-liveness-probe-port 8080 \
  --container-liveness-probe-initial-delay 60 \
  --container-liveness-probe-period 10 \
  --container-liveness-probe-timeout 5 \
  --container-liveness-probe-failure-threshold 3
  • --container-liveness-probe-path "/healthz": Same path.
  • --container-liveness-probe-initial-delay 60: Wait 60 seconds before the first liveness check. You typically want a longer initial delay for liveness probes than readiness probes, as you don’t want to restart a container that’s just starting up.
  • --container-liveness-probe-period 10: Check every 10 seconds.
  • --container-liveness-probe-timeout 5: Allow 5 seconds for a response.
  • --container-liveness-probe-failure-threshold 3: The probe must fail 3 consecutive times before Cloud Run restarts the container.

The Mental Model:

Think of readiness as your "doorbell." Cloud Run rings the doorbell (/healthz) periodically. Only when your application answers with a friendly "I’m ready!" (200 OK) does Cloud Run let customers through the door. If the doorbell goes unanswered or gets a "go away" (non-200 status), Cloud Run stops sending new customers and might eventually send the instance to bed if it keeps acting up. Liveness is more like checking if the lights are still on inside and if anyone’s home; if the house seems dark and empty after multiple checks, Cloud Run will assume it’s time for a renovation (restart).

The most surprising mechanical detail is that Cloud Run’s internal proxy that handles probe requests also respects the PORT environment variable. If your container is configured to listen on a different port than 8080 (e.g., defined by PORT in your Dockerfile or environment), you must specify that same port for your probes, not just hardcode 8080.

Understanding the interplay between initial-delay, period, and timeout is key to tuning probes without causing unnecessary restarts or keeping unhealthy instances alive too long.

The next thing to understand is how to implement more sophisticated health checks that go beyond a simple HTTP 200.

Want structured learning?

Take the full Cloud-run course →