Diversifying your EC2 Spot Fleet across multiple Spot pools is the single most effective way to dramatically reduce interruptions and improve the availability of your workloads.

Let’s see this in action. Imagine you’re running a web service on a Spot Fleet. Without diversification, you might configure it like this:

{
  "Type": "SPOT",
  "TargetCapacity": 10,
  "LaunchSpecifications": [
    {
      "ImageId": "ami-0abcdef1234567890",
      "InstanceType": "m5.large",
      "SubnetId": "subnet-0123456789abcdef0",
      "WeightedCapacity": 1,
      "SpotPrice": "0.08",
      "BlockDeviceMappings": [
        {
          "DeviceName": "/dev/sda1",
          "Ebs": {
            "VolumeSize": 30,
            "VolumeType": "gp3"
          }
        }
      ]
    }
  ]
}

This fleet will try to acquire 10 m5.large instances in a single subnet, using a fixed SpotPrice. If that specific InstanceType in that AvailabilityZone (implied by the subnet) becomes unavailable or its price spikes above 0.08, your entire fleet is at risk of interruption.

Now, let’s diversify. We’ll allow the fleet to choose from multiple InstanceTypes and SubnetIds across different AvailabilityZones.

{
  "Type": "SPOT",
  "TargetCapacity": 10,
  "LaunchSpecifications": [
    {
      "ImageId": "ami-0abcdef1234567890",
      "InstanceType": "m5.large",
      "SubnetId": "subnet-0123456789abcdef0",
      "WeightedCapacity": 1,
      "BlockDeviceMappings": [
        {
          "DeviceName": "/dev/sda1",
          "Ebs": {
            "VolumeSize": 30,
            "VolumeType": "gp3"
          }
        }
      ]
    },
    {
      "ImageId": "ami-0abcdef1234567890",
      "InstanceType": "m5.xlarge",
      "SubnetId": "subnet-0123456789abcdef1",
      "WeightedCapacity": 2,
      "BlockDeviceMappings": [
        {
          "DeviceName": "/dev/sda1",
          "Ebs": {
            "VolumeSize": 30,
            "VolumeType": "gp3"
          }
        }
      ]
    },
    {
      "ImageId": "ami-0abcdef1234567890",
      "InstanceType": "c5.large",
      "SubnetId": "subnet-0123456789abcdef2",
      "WeightedCapacity": 1,
      "BlockDeviceMappings": [
        {
          "DeviceName": "/dev/sda1",
          "Ebs": {
            "VolumeSize": 30,
            "VolumeType": "gp3"
          }
        }
      ]
    }
  ]
}

Notice a few key changes:

  • Multiple LaunchSpecifications: We now have several blocks, each defining a different instance type and subnet.
  • Different InstanceTypes: m5.large, m5.xlarge, and c5.large are included. These belong to different families and sizes, meaning they draw from distinct Spot capacity pools.
  • Different SubnetIds: Each subnet is in a different Availability Zone (e.g., us-east-1a, us-east-1b, us-east-1c). This is crucial because Spot capacity is highly localized to individual AZs.
  • WeightedCapacity: This is important. m5.xlarge has a WeightedCapacity of 2. If your TargetCapacity is 10, and your fleet has 3 m5.large instances (each contributing 1) and 1 m5.xlarge instance (contributing 2), you’ve only fulfilled 3*1 + 1*2 = 5 capacity units. The fleet will continue to launch instances until it reaches 10 capacity units. This allows you to use a mix of instance sizes to meet your target capacity more efficiently.
  • No SpotPrice (for On-Demand price setting): When you omit SpotPrice and use InstanceInterruptionBehavior: "terminate" (which is the default), Spot Fleet will bid the On-Demand price for each instance type. This is generally the safest approach for availability, as it guarantees your bid will always be met, and you only pay the Spot price. You can still set a maximum price if cost is a primary concern, but for availability, let it bid On-Demand.

The core problem Spot Fleet solves is abstracting away the volatile nature of Spot market pricing and availability. Instead of you manually tracking prices and capacity across hundreds of instance type/AZ combinations, you declare your requirements (CPU, memory, storage, network, cost constraints) and let Spot Fleet do the heavy lifting of finding the cheapest, most available capacity that meets those needs.

When you diversify, you’re essentially telling Spot Fleet, "I need 10 units of compute. I’m flexible on the exact flavor of compute (m5.large, m5.xlarge, c5.large) and where it is (us-east-1a, us-east-1b, us-east-1c), as long as it meets my basic specs and isn’t too expensive." This gives the Spot Fleet allocator many more options. If m5.large in us-east-1a suddenly dries up, it can seamlessly spin up c5.large in us-east-1b or another m5.xlarge in us-east-1c to maintain your TargetCapacity.

The LaunchSpecifications are not just about instance types and subnets; they are the fundamental definition of the "Spot pools" your fleet can draw from. Each unique combination of InstanceType and AvailabilityZone (determined by SubnetId) represents a distinct Spot market. By providing multiple LaunchSpecifications, you are creating a larger, more resilient pool of options for Spot Fleet.

A common misconception is that WeightedCapacity is just for cost optimization. While it can be used for that (e.g., assigning a higher weight to a cheaper instance type), its primary function in diversification is to allow you to express your desired capacity units using a mix of instance sizes. If your application scales based on CPU, and m5.xlarge has twice the CPU of m5.large, you can assign it a WeightedCapacity of 2 so that one m5.xlarge instance effectively counts as two m5.large instances towards your TargetCapacity. This gives the fleet more flexibility to fulfill your capacity needs with different instance sizes.

The next hurdle you’ll likely face after achieving high availability with diversified Spot Fleets is managing the lifecycle of these instances, particularly when it comes to graceful shutdown during interruptions.

Want structured learning?

Take the full Ec2 course →