EKS clusters can be surprisingly cheap if you leverage Spot Instances, but running mission-critical workloads on them requires a careful balance with On-Demand capacity.
Here’s a cluster running a basic Nginx deployment, split between On-Demand and Spot:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 5
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
The trick to balancing is in the nodeSelector or nodeAffinity in your Pod specifications, combined with how you configure your EKS Managed Node Groups.
Let’s say you have two Managed Node Groups: ondemand-nodes and spot-nodes. You’d label the nodes in each group accordingly.
For ondemand-nodes:
aws eks describe-nodegroup --cluster-name my-cluster --nodegroup-name ondemand-nodes --query 'nodegroup.nodeGroupArn' --output text | \
awk -F'/' '{print $NF}' | \
xargs -I {} aws ec2 create-tags --resources {} --tags Key=eks.amazonaws.com/nodegroup,Value=ondemand-nodes
For spot-nodes:
aws eks describe-nodegroup --cluster-name my-cluster --nodegroup-name spot-nodes --query 'nodegroup.nodeGroupArn' --output text | \
awk -F'/' '{print $NF}' | \
xargs -I {} aws ec2 create-tags --resources {} --tags Key=eks.amazonaws.com/nodegroup,Value=spot-nodes
Then, in your Deployment YAML, you’d specify which node group your pods should prefer or be restricted to. For critical workloads needing guaranteed uptime, you’d use nodeSelector to target your On-Demand nodes:
nodeSelector:
eks.amazonaws.com/nodegroup: ondemand-nodes
For less critical or batch workloads, you’d target your Spot nodes:
nodeSelector:
eks.amazonaws.com/nodegroup: spot-nodes
Alternatively, nodeAffinity offers more nuanced control, allowing for requiredDuringSchedulingIgnoredDuringExecution or preferredDuringSchedulingIgnoredDuringExecution.
For example, to prefer Spot nodes but fall back to On-Demand if Spot isn’t available:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- spot-nodes
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- ondemand-nodes
This setup ensures that your most important pods land on stable, On-Demand instances, while you can fill the rest of your capacity with significantly cheaper Spot instances. The podAntiAffinity in the example is crucial for ensuring that replicas of the same pod don’t all land on the same physical host, even within a single node group, improving resilience against single-node failures.
The key insight here is that EKS automatically labels nodes in managed node groups with eks.amazonaws.com/nodegroup=<nodegroup-name>. You don’t need to manually add these labels. You simply define your node groups with distinct names and EKS handles the tagging for you.
The core problem this solves is the cost vs. reliability trade-off. On-Demand instances are reliable but expensive. Spot instances are cheap but can be interrupted with little notice. By segmenting your nodes and directing your pods appropriately, you can have your cake and eat it too: critical workloads run on predictable infrastructure, while everything else benefits from cost savings.
A common pitfall is forgetting to set terminationGracePeriodSeconds appropriately on your pods. When a Spot Instance is reclaimed, the kubelet on that instance receives a termination notice. If your application doesn’t handle this signal gracefully and shut down within the grace period, your pod will be forcefully terminated, potentially leading to data loss or incomplete operations. The default grace period is 30 seconds, which is often too short for complex applications.
The next step is exploring how to automatically scale your On-Demand and Spot node groups based on cluster load and the availability of Spot capacity.