Set Cloud Run Memory and CPU Limits to Avoid OOM Kills (2026)

Cloud Run services are crashing with Out of Memory (OOM) errors because the allocated memory isn’t sufficient for the application’s runtime demands.

Common Causes and Fixes for OOM Kills

Under-provisioned Memory: The most frequent culprit is simply not giving your container enough RAM. The default is 512MB, which is often too little for anything beyond a basic "hello world."
- Diagnosis: Check your Cloud Run service configuration for the memory setting. If it’s not explicitly set or is at the default 512Mi, this is your prime suspect. You can also inspect container logs for messages like Killed or oom-killer.
- Fix: Increase the memory allocated to your service. For example, to set it to 2Gi (Gigabytes):
```
gcloud run services update SERVICE_NAME --memory 2Gi --region REGION
```
- Why it works: This directly tells the underlying infrastructure (Kubernetes) to schedule your container on a node with at least 2Gi of available memory, preventing the kernel from terminating your process when it exceeds that limit.
Memory Leaks in Application Code: Applications that continuously allocate memory without releasing it will eventually exhaust even generous limits.
- Diagnosis: Profile your application’s memory usage. For Node.js, use heapdump or Chrome DevTools. For Python, memory_profiler or objgraph. Look for steadily increasing memory over time that never returns to a baseline.
- Fix: Identify and fix the leak in your application code. This often involves ensuring resources like file handles or database connections are properly closed, or that large objects are garbage collected when no longer needed.
- Why it works: By releasing memory that is no longer in use, your application’s footprint stays within the allocated limit, preventing the OOM killer.
Large Dependency/Package Downloads at Startup: If your application downloads large files, unpacks big archives, or installs many dependencies during its initialization phase, it can spike memory usage significantly.
- Diagnosis: Observe the startup logs of your container. If you see extensive download or extraction activity, this is a strong indicator.
- Fix: Pre-package these assets into your container image. Instead of downloading them at runtime, bake them into the Dockerfile using COPY or by installing them during the image build process.
- Why it works: Moving these memory-intensive operations from runtime to build time means the memory is consumed by the build environment, not your live Cloud Run instance.
Insufficient CPU Leading to Increased Memory Consumption: Sometimes, a lack of CPU can indirectly cause memory issues. If your application is starved for CPU, it might spin up more worker processes or threads to compensate, each consuming its own memory. Or, long-running, CPU-bound tasks might hold onto memory longer than expected.
- Diagnosis: Check the cpu limit for your Cloud Run service. If it’s set too low (e.g., 1000m, which is 1 vCPU, is often a good starting point, but can be insufficient for heavy workloads), or if your application logs show high CPU saturation, this could be a factor.
- Fix: Increase the CPU allocated to your service. For example, to set it to 2000m (2 vCPUs):
```
gcloud run services update SERVICE_NAME --cpu 2000m --region REGION
```
- Why it works: Providing adequate CPU allows your application to process tasks efficiently, reducing the need for workarounds that consume excessive memory and ensuring that tasks complete promptly, releasing resources.
Unbounded Concurrency: If your application is configured to handle a very high number of concurrent requests, and each request requires a significant amount of memory, the total memory usage can exceed the limit.
- Diagnosis: Review your application’s concurrency settings within Cloud Run (--concurrency flag) and any internal thread/process pool configurations. If your application can handle many requests simultaneously, and each has a non-trivial memory footprint, this is a risk.
- Fix: Reduce the --concurrency setting for your Cloud Run service. For instance, if your service is set to handle 80 concurrent requests, try reducing it to 10 or 20:
```
gcloud run services update SERVICE_NAME --concurrency 10 --region REGION
```
- Why it works: By limiting the number of requests that can be processed concurrently, you cap the peak memory usage at any given moment, ensuring it stays within the allocated memory limit.
Large Request/Response Payloads: If your application frequently handles very large HTTP request bodies or generates very large response bodies, these payloads need to be buffered in memory.
- Diagnosis: Examine your application logs for evidence of large data processing. If your API endpoints typically involve uploading or downloading multi-megabyte files, this is a likely cause.
- Fix: Optimize your application to stream large payloads instead of buffering them entirely in memory. For responses, consider using streaming APIs. For requests, if possible, process uploads in chunks. Ensure your application’s internal buffers are not excessively large.
- Why it works: Streaming avoids loading the entire payload into RAM at once, drastically reducing the memory required to handle large data transfers.

The next error you’ll likely encounter after resolving OOM issues is a "Crash Loop Back-off" if your container fails to start for other reasons, or potentially a "Resource Exhaustion" error if you’ve increased limits but still face system-wide constraints.