Cut Cloud Functions Costs with Min Instances and Right-Sizing (2026)

Cloud Functions, when left unchecked, can become a hidden cost sinkhole, especially with their default scaling behavior.

Let’s see how a simple "Hello World" function, deployed with default settings, can rack up costs and how we can tame it.

Here’s a basic Node.js function:

exports.helloWorld = (req, res) => {
  let message = req.query.message || req.body.message || 'Hello World!';
  res.status(200).send(message);
};

Deployed with gcloud functions deploy helloWorld --runtime nodejs18 --trigger-http --allow-unauthenticated, this function will spin up a new instance for every single incoming request if there isn’t an active instance ready. This "scale-to-zero" model is great for infrequent workloads, but for even moderate, bursty traffic, it means a lot of cold starts and a lot of overhead.

The problem we’re solving is that for predictable, consistent workloads, Cloud Functions’ default behavior of spinning up and tearing down instances for each request is inefficient and expensive. It incurs significant cold start latency and the overhead of instance management is paid for on every invocation.

The core mechanism at play here is the Cloud Functions runtime environment. When a request comes in, the system checks if an idle instance is available. If not, it provisions a new one, loads your code, and then runs it. This provisioning step is what we call a "cold start" and it adds latency. For functions that are called frequently, even if the actual execution time is milliseconds, the constant provisioning can become a significant cost driver.

The two primary levers we have to control this are minimum instances and right-sizing.

Minimum Instances

This setting tells Cloud Functions to keep a specified number of instances warm and ready to serve requests at all times. This eliminates cold starts for that baseline traffic.

Diagnosis: To understand your current baseline, monitor your function’s invocation count and latency. Look for patterns of consistent, albeit low-level, traffic. Check the "Instance count" metric in Cloud Monitoring. If it frequently drops to zero and then spikes, you’re experiencing cold starts.

Fix: Deploy your function with the --min-instances flag.

gcloud functions deploy helloWorld \
  --runtime nodejs18 \
  --trigger-http \
  --allow-unauthenticated \
  --min-instances 1

This tells Cloud Functions to always keep at least one instance running. If your traffic consistently averages 5 requests per second, you might set --min-instances 5. This incurs a small cost for keeping instances alive, but it’s often far less than the cost of repeated cold starts and the associated overhead for frequent invocations. For example, keeping one 128MB instance alive 24/7 might cost around $5-$10/month, a bargain if that instance saves you hundreds of cold starts per day.

Right-Sizing

This is about ensuring your function has just enough CPU and memory allocated to it to run efficiently without being over-provisioned.

Diagnosis: Again, Cloud Monitoring is your friend. Look at the "Memory utilization" and "CPU utilization" metrics for your function. If memory utilization is consistently below 50% and CPU is similarly low, you’re likely over-provisioned. Conversely, if memory is hitting 100% or CPU is pegged at 100% for extended periods, you’re under-provisioned and may be seeing degraded performance or even throttling.

Fix: Deploy your function with the --memory and --cpu flags.

gcloud functions deploy helloWorld \
  --runtime nodejs18 \
  --trigger-http \
  --allow-unauthenticated \
  --min-instances 1 \
  --memory 256MB \
  --cpu 1

Here, we’ve set memory to 256MB and CPU to 1. A 128MB / 0.1 CPU configuration is the default. Increasing memory and CPU directly increases the cost per hour for each instance, but if your function needs it, it can dramatically reduce execution time (and thus billable execution time) and prevent performance issues. For many simple functions, 256MB and 1 CPU might be overkill, but it’s a common sweet spot for balancing performance and cost. For a function that was previously struggling with 128MB, bumping to 256MB might reduce its execution time by 30%, effectively halving the compute cost for that function’s duration.

The real magic happens when you combine these. For a function with predictable, moderate traffic, setting --min-instances 2 and --memory 512MB --cpu 1 ensures fast, consistent performance and predictable costs. The cost of keeping those two instances warm is a fixed baseline, and the right-sizing ensures they operate efficiently.

The most surprising thing about minimum instances is that they don’t guarantee your function will always be ready to serve requests within a certain latency. While they eliminate the cold start provisioning time, the instance itself might still be busy processing a previous request. If all minimum instances are occupied, a new request will still have to wait for an instance to become free, or trigger the scaling up of additional instances beyond the minimum.

The next hurdle is understanding how concurrency settings interact with minimum instances to manage throughput.