Setting CPU and memory limits on your Kubernetes workflow pods is crucial for ensuring stability and efficient resource utilization, but it’s often misunderstood.

Here’s a workflow pod in action, demonstrating how it consumes resources:

apiVersion: v1
kind: Pod
metadata:
  name: example-workflow-pod
spec:
  containers:
  - name: main-container
    image: ubuntu:latest
    command: ["sleep", "3600"]
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

This pod requests 64Mi of memory and 250 millicores (0.25 CPU cores) to start. It’s guaranteed to have at least these resources available. Crucially, it’s limited to using a maximum of 128Mi of memory and 500 millicores.

The core problem workflow pods solve is predictable resource consumption. Without limits, a runaway process in one pod could consume all available CPU or memory on a node, impacting other pods and potentially crashing the node itself. Kubernetes uses these requests and limits to schedule pods onto nodes and to enforce resource boundaries.

When you set requests, Kubernetes’ scheduler uses this information to decide which node is best suited for the pod. It looks for nodes that have enough available capacity (total node capacity minus already-requested resources by other pods) to satisfy the pod’s requests. This is a hard guarantee: if a node can’t meet the requests, the pod won’t be scheduled there.

limits, on the other hand, are enforced by the Kubelet on the node. If a container tries to use more CPU than its limit, it will be throttled. If it tries to use more memory than its limit, it will be OOMKilled (Out Of Memory Killed) and restarted.

The relationship between requests and limits is key to understanding resource management. A common mistake is to set limits far higher than requests. While this might seem like providing "room to grow," it can lead to over-scheduling. If many pods request small amounts of resources but have very high limits, the scheduler might pack them onto a node believing it has plenty of capacity (based on requests), only for several pods to start hitting their high limits simultaneously, causing contention and instability. Conversely, setting limits too low can lead to unnecessary OOMKills or CPU throttling, hindering legitimate workload execution.

The requests value determines how much resource is reserved for the pod. This reservation counts against the node’s allocatable resources. The limits value determines the maximum resource the container is allowed to consume. If a container exceeds its CPU limit, its CPU usage will be throttled. If it exceeds its memory limit, it will be terminated.

Consider a scenario where you have a batch job that occasionally spikes in memory usage but generally uses very little. You might set a low request.memory to allow more pods to be scheduled on a node, but a higher limit.memory to accommodate those infrequent spikes without causing an OOMKill. The tradeoff is that if many such pods spike their memory simultaneously, they might collectively exceed the node’s actual physical memory, leading to the node swapping heavily or even OOMKilling pods that have lower limits than others but were scheduled there due to low requests.

The cpu.shares mechanism, which is implicitly tied to requests.cpu, is also worth noting. When a node is under CPU pressure, containers are allocated CPU time proportionally to their cpu.shares. A request of 250m is equivalent to 256 shares (a common base unit). If you have two pods on a node, one requesting 250m and another 750m, and the node is overloaded, the second pod will receive approximately three times more CPU time than the first. However, this only applies when the node is CPU-bound; if there’s plenty of CPU, both pods can use up to their respective limits.cpu.

The most nuanced aspect of resource management here is how requests influence scheduling decisions versus how limits are enforced at runtime. The scheduler uses requests to decide where a pod can run, ensuring that the guaranteed resources are available. The Kubelet uses limits to prevent a pod from consuming more than its allocated share, applying throttling or termination. This means a pod might be scheduled onto a node because its requests are low, but if other pods on that node collectively have high limits and start consuming heavily, your pod could still experience resource starvation if it’s not prioritized or if the node becomes overcommitted due to its requests.

When you set CPU and memory limits, you’re not just telling Kubernetes how much a pod might use, but also how much it will use at a minimum and what the absolute ceiling is. This dual nature is why careful tuning is essential, balancing the need for scheduling density with the requirement for reliable workload execution.

The next logical step after setting resource limits is to monitor their effectiveness and adjust them based on observed performance and potential OOMKills.

Want structured learning?

Take the full Argo-workflows course →