The Vertical Pod Autoscaler (VPA) doesn’t actually scale your pods in the traditional sense of adding more replicas; instead, it intelligently adjusts the CPU and memory requests and limits of your existing pods to match their actual resource consumption.
Let’s watch it in action. Imagine we have a simple Nginx deployment running on EKS.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
This deployment starts with modest resource requests and limits. Now, let’s deploy the VPA itself. We’ll use a VerticalPodAutoscaler object that targets our Nginx pods.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: nginx-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "1"
memory: "1Gi"
The updateMode: "Auto" tells the VPA to automatically apply its recommendations. targetRef points to our Nginx deployment. The resourcePolicy defines the bounds for recommendations, ensuring VPA doesn’t suggest values outside a safe range.
Now, let’s simulate some load on our Nginx pods. We can use kubectl exec to run a simple curl command repeatedly.
# In one terminal, run a loop to generate load
for i in {1..1000}; do kubectl exec $(kubectl get pods -l app=nginx -o jsonpath='{.items[0].metadata.name}') -- curl http://localhost:80/ > /dev/null; done
While this load is running, the VPA controller (which runs as a pod in the kube-system namespace) will be observing the resource usage of the Nginx container. It collects metrics from the metrics-server. After a while, you can check the VPA object’s status:
kubectl describe vpa nginx-vpa
You’ll see output similar to this, showing VPA’s recommendations:
...
Status:
Conditions:
Last Transition Time: 2023-10-27T10:30:00Z
Message: recommendation is ready
Reason: RecommendationReady
Status: True
Type: RecommendationReady
Recommendation:
Container Recommendations:
Container Name: nginx
Lower Bound:
Cpu: 150m
Memory: 180Mi
Target:
Cpu: 180m
Memory: 200Mi
Uncapped Target:
Cpu: 180m
Memory: 200Mi
...
Notice the Target values. VPA has observed that the Nginx pod is using around 180m of CPU and 200Mi of memory, and it recommends adjusting the pod’s requests and limits to these values.
When updateMode is "Auto", VPA will evict the pod and the deployment controller will recreate it with the new resource requests and limits. You’ll see the pod restart.
kubectl get pods -l app=nginx -o wide
The RESTARTS count for the Nginx pod will increase, and you’ll see a new pod with updated resource requests.
kubectl describe pod <new-nginx-pod-name> | grep -A 4 Resources
This shows the new, right-sized requests and limits.
The core problem VPA solves is the "noisy neighbor" and "resource starvation" scenarios caused by over-provisioning or under-provisioning. Developers often set conservative, high resource requests to avoid OOMKills or CPU throttling, leading to wasted cluster resources. Conversely, setting them too low can cause performance degradation or pod evictions. VPA automates this by observing actual usage and adjusting.
Internally, VPA consists of three main components:
- Recommender: This component continuously analyzes historical and current resource usage of pods. It uses a weighted average of recent usage, giving more importance to recent data, and applies a confidence interval. It produces recommendations for CPU and memory requests.
- Updater: This component checks if pods have the recommended resources. If a pod’s current requests/limits differ from the recommendation, and
updateModeisAuto, it evicts the pod. The deployment controller then recreates the pod with the new resource settings. - Admission Controller: This component intercepts pod creation and updates. If a VPA object targets a pod, the admission controller injects the VPA’s recommended resources into the pod spec before it’s created or updated.
The resourcePolicy in the VPA object is crucial. The minAllowed and maxAllowed fields act as guardrails. minAllowed prevents VPA from recommending values too low, which could starve the application or cause instability. maxAllowed prevents VPA from recommending values too high, which could lead to over-provisioning and wasted resources. These are absolute values, not percentages, and are applied per container.
VPA does not directly manage the number of pod replicas. That’s the job of the Horizontal Pod Autoscaler (HPA). However, VPA’s recommendations can inform HPA’s scaling decisions. If VPA increases the resource requests of a pod, HPA might trigger scaling up sooner if the metric it’s watching (like CPU utilization) reaches its target threshold. You can even have VPA and HPA work together on the same deployment, but you need to configure VPA’s updateMode to Initial or Recreate to avoid conflicts where VPA constantly evicts pods that HPA is trying to scale.
The "uncapped target" in the VPA status is important: it’s the raw recommendation before being clamped by minAllowed and maxAllowed in the resourcePolicy. If the uncapped target falls outside these bounds, the target will be adjusted to the nearest bound. This ensures your policies are always respected.
When you’re debugging VPA, a common oversight is not having the metrics-server running in your cluster. VPA relies on metrics-server to fetch resource usage data. If metrics-server is not installed or not functioning correctly, VPA will not be able to make accurate recommendations, and its status will often show RecommendationNotReady.