Cloud Functions can spin down to zero instances when not in use, which is great for saving money but means the first request after a period of inactivity will be slow while a new instance boots up.
Let’s see what that looks like. Imagine a simple function that just returns the current time.
import functions_framework
import datetime
@functions_framework.http
def hello_http(request):
"""HTTP Cloud Function.
Args:
request (flask.Request): The request object.
<https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
Returns:
The response text, or any set of values that can be turned into a
Response object using `make_response`
<https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>.
"""
request_json = request.get_json(silent=True)
request_args = request.args
if request_json and 'name' in request_json:
name = request_json['name']
elif request_args and 'name' in request_args:
name = request_args['name']
else:
name = 'World'
return f"Hello {name}!"
If this function hasn’t been hit for a while, the first request might take 5-10 seconds. You’ll see it in your logs:
2023-10-27T10:00:00.123Z INFO Function execution started
2023-10-27T10:00:08.456Z INFO Function execution took 8233ms
That 8 seconds is the "cold start." The underlying infrastructure had to provision a new container, download your function code, and initialize its runtime. The second request, a few seconds later, will be fast:
2023-10-27T10:00:15.789Z INFO Function execution started
2023-10-27T10:00:16.012Z INFO Function execution took 223ms
This is fine for many use cases, but if your function needs to respond quickly, like for an API gateway or a real-time application, those cold starts are unacceptable.
The solution is to tell Cloud Functions to keep a minimum number of instances warm and ready to go. This is done using the min-instances flag when deploying or updating your function.
Let’s say you want to ensure at least 2 instances are always running. You’d deploy like this:
gcloud functions deploy YOUR_FUNCTION_NAME \
--runtime YOUR_RUNTIME \
--trigger-http \
--min-instances 2 \
--allow-unauthenticated
Replace YOUR_FUNCTION_NAME and YOUR_RUNTIME with your actual function name and its runtime (e.g., python310, nodejs18). The --allow-unauthenticated is just for making it easy to test HTTP functions; you’ll adjust this based on your security needs.
Once deployed with min-instances set to 2, Cloud Functions will ensure that at least two instances of your function are always running and available. When a request comes in, it’s routed to one of these warm instances, and you get a fast response, typically under 500ms, even if the function hasn’t been invoked for hours.
The trade-off, of course, is cost. Keeping instances warm means you’re paying for them even when they’re not actively processing requests. For a function with min-instances = 2 and a memory allocation of 256MB, you’re looking at a baseline cost for those two instances running 24/7. You can estimate this cost in the Google Cloud pricing calculator.
You can also set max-instances to control the upper limit of how many instances can scale up, which helps manage runaway costs if your function suddenly gets a huge traffic spike. Setting min-instances implicitly sets max-instances to be at least as large as min-instances, but it’s good practice to set both explicitly if you have specific scaling requirements.
The min-instances setting is applied at the function level. If you have multiple functions in the same region, each function with min-instances configured will maintain its own pool of warm instances. This means you need to carefully consider which functions truly require low latency and apply min-instances only to those, to balance performance needs with cost optimization.
When you set min-instances, Cloud Functions starts provisioning those instances immediately after the deployment is complete. You can monitor this in the Cloud Functions console under the "Instance" tab for your function, where you’ll see the current number of instances running.
One common misconception is that min-instances guarantees zero latency. While it drastically reduces cold start times, there are still other factors that can contribute to latency, such as network hops, the complexity of your function’s logic, and external API calls. The min-instances setting specifically addresses the time it takes to get an execution environment ready.
The next challenge you’ll face is managing the cost implications of maintaining warm instances, especially as your min-instances count grows across multiple functions.