ECS can’t directly "run" GPU workloads; it orchestrates containers, and those containers need to be configured to access and utilize the underlying EC2 instance’s GPU hardware.

Here’s how you set that up, treating it like a real-world problem:

First, you need an EC2 instance with GPUs. Not just any instance, but one from the g4dn, p3, p4d, or inf1 families. These are your GPU workhorses. When you launch these instances, you must install the NVIDIA driver. ECS doesn’t do this for you; it expects the host OS to be ready. The easiest way is often using the AWS Deep Learning AMI, which comes pre-loaded with drivers and CUDA. If you’re building your own AMI, you’ll need to manually install the appropriate NVIDIA driver version for your chosen GPU hardware and CUDA toolkit.

Once your GPU-enabled EC2 instance is running and has drivers, you need to tell ECS about its GPU capabilities. This is done at the cluster level, within the cluster’s capacity provider configuration. When you create or update a capacity provider for your GPU instances, you’ll specify the EC2 launch type as g4dn, p3, etc., and crucially, you’ll set the instance type to match your GPU instance family. ECS uses this to ensure it only schedules GPU-enabled tasks onto these specific instances.

Now, for your task definition. This is where you tell your container it needs a GPU. In the task definition, under containerDefinitions, you need to add the resourceRequirements field. Inside resourceRequirements, you specify devices, and for each device, you set type to gpu. You also need to define count (how many GPUs the container needs) and capabilities (like compute, utility, displaycompute is usually sufficient for training). For example, to request one GPU:

"resourceRequirements": [
  {
    "type": "gpu",
    "count": 1,
    "capabilities": ["compute"]
  }
]

Your container image itself needs to be built with GPU support. This means including the CUDA toolkit and any necessary libraries (like cuDNN for deep learning). Docker images designed for GPU workloads often start from NVIDIA’s CUDA base images (e.g., nvidia/cuda:11.0-cudnn8-runtime-ubuntu20.04). If you’re training a model, your Python script or application within the container will use libraries like TensorFlow, PyTorch, or MXNet, which are compiled to leverage the CUDA API exposed by the NVIDIA driver on the host.

When ECS schedules your task, it checks the resourceRequirements against the available GPU resources on the registered EC2 instances in the cluster. If an instance has an available GPU and matches the resourceRequirements (including the GPU type if specified), ECS will launch the task on that instance. The Docker runtime on the EC2 host then uses the NVIDIA Container Toolkit (nvidia-docker2) to map the host’s GPU devices into the container. This mapping is what allows your application inside the container to see and use the GPU.

A common pitfall is forgetting to install the NVIDIA driver on the EC2 instance. If the driver isn’t present, the nvidia-container-runtime will fail, and your container won’t be able to access the GPU, leading to errors like "CUDA driver version is insufficient for CUDA runtime version." Another is misconfiguring the resourceRequirements in the task definition, requesting more GPUs than available or specifying the wrong type. Always double-check that your task definition’s GPU requirements align with the instance type and the number of GPUs provisioned.

The next thing you’ll likely encounter is managing GPU memory. While you can request GPUs, you can’t directly specify how much GPU memory a task needs. ECS will assign a whole GPU if available. If multiple tasks are scheduled on the same multi-GPU instance, they’ll share the GPUs. You’ll need to monitor GPU memory usage within your tasks to avoid Out-Of-Memory (OOM) errors on the GPU, which manifest differently than CPU OOMs.

Want structured learning?

Take the full Ecs course →