Set Up GPU Node Groups on EKS for CUDA Workloads (2026)

AWS EKS’s GPU node groups aren’t just about attaching GPUs; they fundamentally change how your Kubernetes scheduler sees and allocates compute resources, enabling massively parallel CUDA workloads.

Let’s see this in action with a simple Kubernetes Pod definition that requests a GPU.

apiVersion: v1
kind: Pod
metadata:
  name: cuda-test-pod
spec:
  containers:
  - name: cuda-container
    image: nvidia/cuda:11.0-base-ubuntu18.04
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1 # Requesting one GPU
  restartPolicy: Never

When you deploy this Pod to an EKS cluster with GPU nodes, the Kubernetes scheduler, specifically the kube-scheduler, sees the nvidia.com/gpu: 1 resource request. It then looks for a node that has the nvidia.com/gpu resource available, which is exposed by the NVIDIA device plugin running on your GPU worker nodes. If such a node is found, the Pod is scheduled there. The NVIDIA device plugin, acting as a kueue for GPUs, ensures that the requested GPU is properly allocated and made available to the container.

The core problem EKS GPU node groups solve is making specialized, expensive hardware (NVIDIA GPUs) accessible and manageable within a dynamic, multi-tenant Kubernetes environment. Without them, you’d be manually SSHing into GPU machines, installing drivers, and trying to coordinate workloads, which is a recipe for chaos. EKS abstracts this away, allowing you to treat GPUs as just another resource type your applications can request.

Internally, the magic happens through a combination of EKS managed node groups (or self-managed ones, but managed is easier for this), the NVIDIA device plugin for Kubernetes, and the EC2 instance types you choose.

EC2 GPU Instances: You start by selecting an EC2 instance type that has NVIDIA GPUs attached (e.g., p3.2xlarge, g4dn.xlarge, g5.xlarge). These instances come with the necessary hardware.
EKS Node Group Configuration: When you create an EKS node group, you specify the EC2 instance type. EKS handles provisioning these instances and joining them to your cluster. Crucially, for GPU nodes, you need to ensure the Amazon Machine Image (AMI) you use has the NVIDIA drivers pre-installed. EKS provides optimized AMIs for this purpose, often found in the eks-optimized-ami-gpu family.
NVIDIA Device Plugin: This is a Kubernetes DaemonSet that runs on every GPU node. It discovers the GPUs on the node, installs the necessary NVIDIA drivers (if not already present via the AMI), and registers the GPUs as a custom resource (nvidia.com/gpu) with the Kubernetes API server. This is what allows kube-scheduler to see and allocate GPUs.
CUDA-enabled Container Images: Your application containers need to be built with CUDA libraries and run on an image that is compatible with the installed NVIDIA drivers. Using official NVIDIA CUDA base images is the standard practice.

When a Pod requests nvidia.com/gpu: 1, the kube-scheduler queries the API server for nodes advertising this resource. The NVIDIA device plugin is responsible for this advertisement by reporting nvidia.com/gpu as allocatable resources on each GPU node. Once scheduled, the Kubelet on the assigned node, in conjunction with the NVIDIA container runtime interface (CRI) or Docker runtime, ensures the container gets access to the specific GPU device.

The most surprising aspect for many is how Kubernetes treats nvidia.com/gpu not as a quality of service attribute like CPU or memory, but as a distinct, countable allocatable resource. This means a node with two GPUs can fulfill two requests for nvidia.com/gpu: 1 independently, and the scheduler won’t try to balance load across them in the same way it might with CPU. It’s a pure capacity-based allocation. The device plugin is essentially a specialized scheduler extension that translates physical GPU presence into a Kubernetes-native resource.

This setup makes it straightforward to deploy deep learning models, video transcoding services, or any other CUDA-accelerated application without wrestling with infrastructure.

The next logical step is to explore how to manage multiple GPUs per pod or how to use GPU sharing mechanisms like MIG (Multi-Instance GPU) for more granular allocation.