Fargate Spot instances offer significant cost savings by leveraging AWS’s spare EC2 capacity, but they come with the caveat of potential interruptions.
Let’s see this in action. Imagine a typical web application running on Fargate. Normally, you’d provision tasks with cpu: 1024 and memory: 2048 and pay on-demand rates. This is safe but expensive.
# Original on-demand Fargate task definition
Resources:
TargetGroup:
Type: AWS::ECS::TargetGroup
Properties:
# ... other properties ...
Port: 80
Protocol: HTTP
VpcId: vpc-1234567890abcdef0
Service:
Type: AWS::ECS::Service
Properties:
Cluster: arn:aws:ecs:us-east-1:123456789012:cluster/my-cluster
ServiceName: my-web-app
TaskDefinition: !Ref TaskDefinition
DesiredCount: 3
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- subnet-abcdef1234567890
- subnet-fedcba0987654321
SecurityGroups:
- sg-0123456789abcdef0
LoadBalancers:
- TargetGroupArn: !Ref TargetGroup
ContainerName: webapp
ContainerPort: 80
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: my-web-app-task
RequiresCompatibilities:
- FARGATE
NetworkMode: awsvpc
Cpu: "1024" # 1 vCPU
Memory: "2048" # 2 GiB
ContainerDefinitions:
- Name: webapp
Image: public.ecr.aws/nginx/nginx:latest
PortMappings:
- ContainerPort: 80
HostPort: 80
Now, let’s introduce cost-saving measures.
1. Fargate Spot
The most immediate saving comes from using Fargate Spot. Instead of launchType: FARGATE, you use launchType: FARGATE_SPOT. AWS will then run your tasks on spare EC2 capacity. The key difference is that Fargate Spot tasks can be interrupted with a two-minute warning if AWS needs the capacity back. This is suitable for stateless applications or those that can gracefully handle interruptions. The cost savings can be substantial, often 50-70% compared to on-demand.
# Fargate Spot configuration
Service:
Type: AWS::ECS::Service
Properties:
# ... other properties ...
LaunchType: FARGATE_SPOT # Changed from FARGATE
# ... other properties ...
2. ARM (Graviton2) Processors
Fargate now supports AWS Graviton2 processors, which are ARM-based. These processors offer better price-performance than comparable x86 processors. For many workloads, you can achieve similar performance with lower CPU and memory configurations, or even better performance at the same cost. To use Graviton2, you specify cpuArchitecture: ARM64 in your task definition.
# Task definition with ARM architecture
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: my-web-app-task-arm
RequiresCompatibilities:
- FARGATE
NetworkMode: awsvpc
CpuArchitecture: ARM64 # Added for ARM support
Cpu: "1024"
Memory: "2048"
ContainerDefinitions:
- Name: webapp
Image: public.ecr.aws/nginx/nginx:latest # Ensure your image is multi-arch or ARM-native
# ...
Important: Ensure your container images are built for linux/arm64 or are multi-architecture. Many official images, like the Nginx one shown, are already multi-arch and will automatically pull the correct ARM variant. If you build your own images, you’ll need to configure your build pipeline (e.g., Docker Buildx) to produce ARM64 artifacts.
3. Right-Sizing
This is arguably the most impactful, yet often overlooked, cost-saving technique. It involves accurately determining the CPU and memory resources your application actually needs, rather than over-provisioning.
Diagnosis:
- Enable CloudWatch Container Insights: Ensure Container Insights is enabled for your ECS cluster. This provides detailed metrics on CPU and memory utilization per task.
- Analyze Metrics: Go to CloudWatch -> Metrics -> Container Insights -> ECS -> Cluster Name -> Service Name -> Task Name. Look at the
CPUUtilizationandMemoryUtilizationmetrics. Pay attention to the 95th percentile over a representative period (e.g., a week or a full business cycle). This accounts for peak usage without being overly sensitive to transient spikes. - Identify Bottlenecks (or lack thereof): If your 95th percentile CPU utilization is consistently below 70-80% and memory utilization is below 80-90%, you are likely over-provisioned. If your application is experiencing performance issues and utilization is consistently high, you might be under-provisioned.
Action:
Adjust the Cpu and Memory values in your ECS task definition.
For example, if your analysis shows the 95th percentile CPU is 600m and memory is 1500MiB, you could potentially downsize.
# Right-sized task definition
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: my-web-app-task-optimized
RequiresCompatibilities:
- FARGATE
NetworkMode: awsvpc
Cpu: "512" # Reduced from 1024
Memory: "1024" # Reduced from 2048
# ...
Why it works: You are paying for the provisioned CPU and memory, not just what you use. By matching provisioned resources to actual needs, you reduce the baseline cost per task. Fargate’s smallest unit is 256 CPU / 512 Memory, and it scales up in increments. Choosing values closer to your actual need, within these increments, is key.
Combining Techniques
The real power comes from combining these. Running a right-sized, ARM-based task on Fargate Spot can yield dramatic cost reductions.
Consider a scenario where you’ve right-sized to cpu: 512, memory: 1024, and are using ARM.
# Combined Fargate Spot, ARM, and Right-Sizing
Service:
Type: AWS::ECS::Service
Properties:
Cluster: arn:aws:ecs:us-east-1:123456789012:cluster/my-cluster
ServiceName: my-web-app-optimized
TaskDefinition: !Ref TaskDefinitionOptimized # Reference the new optimized task definition
DesiredCount: 3
LaunchType: FARGATE_SPOT # Using Spot
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- subnet-abcdef1234567890
- subnet-fedcba0987654321
SecurityGroups:
- sg-0123456789abcdef0
LoadBalancers:
- TargetGroupArn: !Ref TargetGroupOptimized
ContainerName: webapp
ContainerPort: 80
TaskDefinitionOptimized:
Type: AWS::ECS::TaskDefinition
Properties:
Family: my-web-app-task-optimized
RequiresCompatibilities:
- FARGATE
NetworkMode: awsvpc
CpuArchitecture: ARM64 # Using ARM
Cpu: "512" # Right-sized
Memory: "1024" # Right-sized
ContainerDefinitions:
- Name: webapp
Image: public.ecr.aws/nginx/nginx:latest
PortMappings:
- ContainerPort: 80
HostPort: 80
The actual cost savings are realized when you look at the Fargate pricing page. For example, on-demand Fargate might be ~$0.04048 per vCPU-hour and ~$0.00441 per GB-hour. Fargate Spot can be 60-70% less. ARM instances offer better performance-per-dollar. Right-sizing means fewer vCPU-hours and GB-hours are consumed in total.
A common pitfall when optimizing is focusing solely on CPU or memory. Many applications are memory-bound. If your application has a small, fixed memory overhead for its runtime (e.g., JVM heap, Python interpreter), and then uses memory dynamically, provisioning memory slightly above the peak dynamic usage plus a buffer is often more stable than aggressively capping it. If you cap memory too low, the task will be OOMKilled and restart, which is worse than paying a little more for stability.
The next challenge you’ll face is managing the interruptions inherent in Fargate Spot and ensuring your application’s resilience.