Cloud Run’s traffic splitting is a surprisingly blunt instrument that can be used to achieve the nuanced control needed for blue-green deployments.
Let’s see it in action. Imagine we have a production service running on Cloud Run, and we want to deploy a new version without any downtime or risk.
Here’s a simplified look at a Cloud Run service definition. Notice the traffic block. This is where the magic happens:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-app
spec:
template:
spec:
containers:
- image: gcr.io/my-project/my-app:v1.0.0
traffic:
- tag: current # This is the tag for the currently live version
revisionName: my-app-00001
percent: 100 # All traffic goes to this revision
To perform a blue-green deployment, we first deploy our new version, let’s call it my-app-00002. Initially, we want no traffic to go to it.
gcloud run deploy my-app \
--image=gcr.io/my-project/my-app:v1.0.1 \
--platform=managed \
--region=us-central1 \
--no-traffic \
--project=my-project
The --no-traffic flag is key here. It deploys the new revision but doesn’t direct any traffic to it. Now our Service object looks like this:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-app
spec:
template:
spec:
containers:
- image: gcr.io/my-project/my-app:v1.0.1
traffic:
- tag: current
revisionName: my-app-00001
percent: 100
- revisionName: my-app-00002 # The new, un-trafficked revision
percent: 0
At this point, my-app-00001 is still handling 100% of the traffic. my-app-00002 is deployed and ready, but invisible to users. This is our "green" environment, ready to be switched to.
Now, we test the new version thoroughly. We can send specific requests to my-app-00002 by referencing its revision name directly in the URL: https://my-app-00002-xyz.a.run.app. Once we’re confident, we can start shifting traffic.
The "blue-green" switch is made by updating the traffic block to send a small percentage of traffic to the new revision.
gcloud run services update-traffic my-app \
--splits 100=my-app-00001,0=my-app-00002 \
--region=us-central1 \
--platform=managed \
--project=my-project
This command tells Cloud Run: "Take 100% of the traffic and send it to revision my-app-00001." This seems counterintuitive for a blue-green switch, but it’s how Cloud Run’s update-traffic command works when you’re initially setting up traffic splits. It’s defining the final state of traffic distribution.
Correction: The above command is not correct for a blue-green switch. The correct approach is to gradually shift traffic.
Let’s correct that. To begin the gradual shift, we’d send, say, 10% of traffic to the new revision:
gcloud run services update-traffic my-app \
--splits 90=my-app-00001,10=my-app-00002 \
--region=us-central1 \
--platform=managed \
--project=my-project
Now, 90% of incoming requests go to the old version (my-app-00001), and 10% go to the new version (my-app-00002). We monitor logs and metrics closely. If we see any issues with my-app-00002, we can immediately revert by sending 100% of traffic back to my-app-00001:
gcloud run services update-traffic my-app \
--to-revision=my-app-00001 \
--region=us-central1 \
--platform=managed \
--project=my-project
If everything looks good, we continue to increase the traffic to my-app-00002 in stages: 50%, then 90%, and finally 100%.
# Shift to 50%
gcloud run services update-traffic my-app \
--splits 50=my-app-00001,50=my-app-00002 \
--region=us-central1 \
--platform=managed \
--project=my-project
# Shift to 90%
gcloud run services update-traffic my-app \
--splits 10=my-app-00001,90=my-app-00002 \
--region=us-central1 \
--platform=managed \
--project=my-project
# Final switch to 100% to the new revision
gcloud run services update-traffic my-app \
--to-revision=my-app-00002 \
--region=us-central1 \
--platform=managed \
--project=my-project
Once my-app-00002 is receiving 100% of traffic, the old revision (my-app-00001) is no longer receiving any requests but remains deployed and available. This is the "blue" environment, now on standby. If a critical issue arises with the new version, we can instantly roll back by directing 100% of traffic back to my-app-00001.
The real power here is that Cloud Run handles the traffic routing at the edge. When you update the traffic split, it’s not a slow DNS propagation; it’s an almost instantaneous re-configuration of the load balancer directing traffic to the appropriate revision. You can even assign tags to revisions, like production or staging, and update these tags to point to different revisions, allowing for more complex workflows.
The most surprising mechanical detail is that Cloud Run’s traffic splitting isn’t actually about percentages in the sense of "10% of requests will go to X." It’s a probability distribution. For any given request, the system randomly selects which revision to send it to based on the defined percentages. This means that even when you have a 90/10 split, it’s possible, though unlikely, for a small burst of requests to all land on the old revision if the random selection happens to fall that way for consecutive requests. This is generally a good thing for gradual rollouts, as it ensures the older version stays warm and available for rollback.
After you’ve successfully deployed and validated your new version, the next step is to clean up old, untrafficed revisions to save on costs and reduce clutter.