Cloud Run is fundamentally a managed Kubernetes, not just a serverless platform.

Let’s watch Cloud Run and Cloud Functions handle a simple HTTP request.

Cloud Functions (1st gen):

Imagine a request hits a Cloud Function. The function is cold. GCP spins up a minimal container, loads your code, and then executes it. If another request comes in immediately, the container might still be warm and can handle it. If there’s a gap, it goes cold again. This rapid spin-up and tear-down is what makes it "serverless" but also introduces latency on cold starts.

Here’s a Python 3.9 function that just echoes back a POST request body:

import functions_framework

@functions_framework.http
def echo_post_body(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): The request object.
        <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
    Returns:
        The response from the request.
    """
    if request.method == 'POST':
        return request.get_data(as_text=True), 200, {'Content-Type': 'text/plain'}
    else:
        return 'Please use POST', 405

Deploying this with gcloud functions deploy echo_post_body --runtime python39 --trigger-http --allow-unauthenticated --region us-central1 creates an endpoint. When you curl -X POST -d "hello world" <your-function-url>, the latency is noticeable on the first call, but subsequent calls are faster if the container stays warm.

Cloud Run:

Now, consider Cloud Run. When you deploy a container image (say, a Docker image), Cloud Run manages a fleet of these containers. It uses a concept called "min instances" (which you can set to 0 or higher) and "max instances." If min_instances is 0, it behaves similarly to Cloud Functions in that it can scale down to zero, but the underlying mechanism is different. Instead of spinning up just your code in a minimal environment, it’s orchestrating a full container.

This container runs your application, listening on a specific port (usually 8080). Cloud Run handles all the ingress routing, scaling, and health checks. You can even configure CPU allocation to be always on or only during request processing.

Here’s a simple Dockerfile for the same echo functionality:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# The command to run your app. The port 8080 is the default
# Cloud Run will send traffic to.
CMD exec gunicorn --bind :8080 --workers 1 --threads 8 --timeout 0 main:app

And a main.py for gunicorn:

from flask import Flask, request

app = Flask(__name__)

@app.route('/', methods=['POST'])
def echo_post_body():
    if request.method == 'POST':
        return request.get_data(as_text=True), 200, {'Content-Type': 'text/plain'}
    else:
        return 'Please use POST', 405

You’d build this into an image (docker build -t gcr.io/your-project-id/echo-app .) and push it (docker push gcr.io/your-project-id/echo-app). Then deploy to Cloud Run: gcloud run deploy echo-service --image gcr.io/your-project-id/echo-app --platform managed --region us-central1 --allow-unauthenticated --min-instances 0 --max-instances 100.

When you curl -X POST -d "hello world from run" <your-run-url>, the initial request to a scaled-down service will have a cold start, but it’s often faster than Cloud Functions because the container is more fully provisioned. Crucially, Cloud Run offers more control: you can keep a minimum number of instances warm (min-instances=1), ensuring near-zero latency for every request, at a small cost.

Key Differences in Action:

  • Execution Environment: Cloud Functions is a specialized, managed runtime for your code. Cloud Run runs any containerized application. This means you can use any language, any library, and any framework with Cloud Run, as long as it can be containerized and listens on 0.0.0.0:8080.
  • Scaling: Both scale to zero. Cloud Functions scales by creating/destroying minimal execution environments. Cloud Run scales by managing container instances, offering finer control with min-instances and max-instances directly.
  • Concurrency: A single Cloud Function instance typically handles one request at a time. A single Cloud Run instance can handle multiple requests concurrently (configured via --concurrency flag during deployment, default is 80). This means one Cloud Run instance can often do the work of many Cloud Functions.
  • Resource Allocation: With Cloud Functions, you select memory and CPU generation. With Cloud Run, you specify memory, CPU, and importantly, CPU allocation (always allocated or only during request processing), which impacts performance and cost.
  • Networking: Cloud Functions has limited networking capabilities. Cloud Run, being Kubernetes-based, offers more robust networking features, including VPC connectors for private access and custom domain mapping.

The Mental Model:

Think of Cloud Functions as a specialized tool for event-driven, short-lived tasks. It’s ideal for simple webhook handlers, data transformations triggered by Pub/Sub, or background jobs. Its simplicity and tight integration with other GCP services are its strengths.

Cloud Run, on the other hand, is a general-purpose serverless compute platform. It’s your go-to when you need to run web applications, APIs, microservices, or any stateless HTTP-based service without managing servers. Its flexibility in supporting any language/framework and its higher concurrency capabilities make it suitable for more complex or high-throughput workloads. The ability to bring your own container is the ultimate freedom.

What most people don’t realize is that Cloud Run’s internal architecture is built on Knative, which itself runs on Google Kubernetes Engine (GKE) under the hood. This means you’re essentially getting a managed Kubernetes experience tailored for serverless, complete with advanced networking, autoscaling, and rolling updates, but abstracted away from the complexity of managing a GKE cluster directly. You can even deploy to GKE yourself using Knative if you need that level of control.

The next step after mastering the differences is understanding how to manage state and background tasks effectively across both platforms, particularly when dealing with long-running operations or complex workflows.

Want structured learning?

Take the full Cloud-run course →