Run GPU Workloads on AKS with CUDA (2026)

NVIDIA GPUs have become essential for accelerating machine learning, deep learning, and other computationally intensive workloads. Azure Kubernetes Service (AKS) can leverage these powerful processors to run your demanding applications.

Here’s how you can set up AKS to utilize NVIDIA GPUs:

Prerequisites

Azure Subscription: You need an active Azure subscription.
Azure CLI: Install and log in to the Azure CLI.
Kubernetes Cluster: An existing AKS cluster or the ability to create one.
NVIDIA GPU-Enabled Azure VM Series: Your AKS nodes must be provisioned on Azure VM sizes that include NVIDIA GPUs. Common choices include Standard_NC, Standard_ND, and Standard_NV series. These are typically available in specific Azure regions.

Setting up the AKS Cluster

When creating your AKS cluster, or when adding a node pool, you need to specify a VM size that includes NVIDIA GPUs.

Creating a new AKS cluster with GPU nodes:

az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 1 \
    --node-vm-size Standard_NC6 \
    --enable-managed-identity \
    --generate-ssh-keys

In this command:

--node-vm-size Standard_NC6: This specifies a VM size with an NVIDIA Tesla K80 GPU. You can replace Standard_NC6 with other GPU-enabled sizes like Standard_ND96asr_v4 (NVIDIA Tesla V100) or Standard_NV6 (NVIDIA Tesla M60) depending on your needs and availability.
--node-count 1: Starts with a single node. You’ll likely want more for production.

Adding a GPU-enabled node pool to an existing cluster:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpupool \
    --node-count 1 \
    --node-vm-size Standard_NC6 \
    --labels gpu=nvidia

Here:

--name gpupool: Assigns a name to this new node pool.
--labels gpu=nvidia: This is a crucial step. Applying a label like gpu=nvidia allows you to target this node pool with specific Kubernetes workloads.

Installing the NVIDIA Device Plugin

Kubernetes needs a way to discover and expose the NVIDIA GPUs to your pods. This is handled by the NVIDIA device plugin. The device plugin runs as a DaemonSet, ensuring that a pod is scheduled on each node with an NVIDIA GPU.

You can install it using Helm, which is the recommended approach for managing Kubernetes applications.

Add the NVIDIA Helm repository:

helm repo add nvidia https://nvidia.github.io/k8s-device-plugin
helm repo update

Install the NVIDIA device plugin:

helm install \
    nvidia-device-plugin \
    nvidia/k8s-device-plugin \
    --namespace kube-system

This command deploys the device plugin into the kube-system namespace. Once deployed, the plugin will automatically detect the NVIDIA GPUs on your nodes and register them as allocatable resources in Kubernetes. You can verify this by checking the pods in the kube-system namespace:

kubectl get pods -n kube-system | grep device-plugin

You should see a nvidia-device-plugin pod running on each of your GPU nodes.

Running GPU Workloads

Now that your AKS cluster is configured and the NVIDIA device plugin is installed, you can deploy applications that require GPU acceleration. You need to specify the GPU resource requests in your pod or deployment manifests.

Here’s an example of a simple CUDA application deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cuda-vector-add
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cuda-vector-add
  template:
    metadata:
      labels:
        app: cuda-vector-add
    spec:
      containers:
      - name: cuda-vector-add
        image: "nvcr.io/nvidia/k8s-device-plugin/cuda-vector-add:1.0"
        resources:
          limits:
            nvidia.com/gpu: 1 # Request one NVIDIA GPU
      restartPolicy: Never

Key parts of this manifest:

image: "nvcr.io/nvidia/k8s-device-plugin/cuda-vector-add:1.0": This is a sample image provided by NVIDIA that demonstrates a simple CUDA vector addition. You would replace this with your actual application image.
resources.limits.nvidia.com/gpu: 1: This is the crucial line. It tells Kubernetes that this container requires one NVIDIA GPU. The NVIDIA device plugin makes the nvidia.com/gpu resource available. If you need multiple GPUs for a single pod, you can increase this number.

To deploy this:

kubectl apply -f your-cuda-deployment.yaml

You can verify that your pod is running and has access to the GPU:

kubectl get pods -l app=cuda-vector-add

Once the pod is running, you can check its logs to see the output of the CUDA application.

Node Affinity and Taints/Tolerations

For more complex scenarios, you might want to ensure your GPU workloads only run on GPU nodes, or you might want to reserve GPU nodes for specific high-priority jobs.

Node Affinity: You can use node affinity to schedule pods specifically onto nodes with GPUs. This is especially useful if you have a mixed cluster with both CPU and GPU nodes.

Add this to your pod’s spec:

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: gpu
                operator: In
                values:
                - nvidia

This ensures the pod will only be scheduled on nodes with the label gpu=nvidia.

Taints and Tolerations: If you want to dedicate specific nodes only for GPU workloads and prevent general CPU pods from being scheduled on them, you can taint the GPU nodes.

First, taint the GPU nodes:

kubectl taint nodes --all gpu=nvidia:NoSchedule

Then, add a toleration to your GPU pod spec:

      tolerations:
      - key: "gpu"
        operator: "Equal"
        value: "nvidia"
        effect: "NoSchedule"

This allows your GPU pods to be scheduled on the tainted nodes, while other pods without the toleration will be prevented.

Monitoring GPU Usage

Monitoring GPU utilization is critical for understanding performance and identifying potential bottlenecks. You can use tools like dcgm-exporter (NVIDIA Data Center GPU Manager) to expose GPU metrics to Prometheus, which can then be visualized in Grafana.

Deploy dcgm-exporter: This is typically done via a Helm chart.
Configure Prometheus to scrape dcgm-exporter metrics.
Set up Grafana dashboards to visualize GPU utilization, memory usage, temperature, etc.

By following these steps, you can effectively harness the power of NVIDIA GPUs within your Azure Kubernetes Service cluster, enabling you to run demanding AI and HPC workloads at scale. The next step is often to explore distributed training frameworks like Horovod or PyTorch Distributed on your GPU-enabled AKS cluster.