CircleCI’s GPU Resource Classes let you run machine learning workloads directly on hardware with NVIDIA GPUs, bypassing the need for expensive cloud GPU instances or complex on-premise setups.
Let’s see it in action. Imagine you’re training a deep learning model using PyTorch. Your config.yml might look something like this:
version: 2.1
orbs:
python: circleci/python@2.0.0
jobs:
train_model:
docker:
- image: cimg/gpu/cuda:11.7.1-base-22.04
resource_class: gpu.xlarge
steps:
- checkout
- python/install-packages:
app-dir: .
pip-requirements: requirements.txt
- run:
name: Train Model
command: python train.py --epochs 10 --batch-size 32 --learning-rate 0.001
- persist_to_workspace:
root: .
paths:
- model/
workflows:
build-and-train:
jobs:
- train_model
Here, gpu.xlarge is the key. It tells CircleCI to provision a machine with an NVIDIA GPU. The cimg/gpu/cuda:11.7.1-base-22.04 Docker image is pre-configured with CUDA drivers and libraries, essential for GPU acceleration. When this job runs, CircleCI provisions a VM with a physical GPU, mounts the Docker image, and your train.py script then automatically detects and utilizes the GPU for faster computation.
The problem this solves is the bottleneck of CPU-bound training for ML models. GPUs are massively parallel processors, excelling at the matrix multiplications and tensor operations that form the core of deep learning. Trying to do this on a CPU can take days or weeks, making iteration and experimentation infeasible. CircleCI’s GPU Resource Classes democratize access to this hardware.
Internally, when you specify resource_class: gpu.xlarge, CircleCI orchestrates the allocation of a physical machine from its GPU-enabled fleet. This isn’t just a VM that pretends to have a GPU; it’s a real piece of hardware. The cimg/gpu images are built on top of Ubuntu and come packed with NVIDIA drivers, the CUDA Toolkit, and cuDNN, the necessary software stack to communicate with and leverage the GPU. Your ML framework (like TensorFlow or PyTorch), when compiled with GPU support and finding these libraries in its environment, will automatically offload computations to the GPU.
The exact levers you control are primarily the resource_class and the Docker image. For example, gpu.xlarge provides a certain amount of GPU memory and compute power. If your model is larger or requires more VRAM, you might need to explore other GPU resource classes CircleCI offers, or potentially scale up your batch size if memory is the constraint. The Docker image selection is also crucial; you need an image compatible with your chosen CUDA version and ML framework.
When you’re running GPU workloads, the CUDA runtime libraries are dynamically loaded by your ML framework. The framework queries the available CUDA devices, identifies the GPU, and then directs the computationally intensive kernels (like convolutions or matrix multiplications) to execute on the GPU’s streaming multiprocessors. This offloading is what provides the speedup. The drivers and CUDA toolkit act as the bridge, translating your framework’s requests into instructions the GPU hardware understands.
What most people don’t realize is that the cimg/gpu images are designed to be minimal, providing just the essential CUDA components. You’ll still need to install your specific Python packages (like torch, tensorflow, numpy, etc.) within the job, and these packages themselves must be the GPU-enabled versions. For PyTorch, this means ensuring you pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 (or the appropriate CUDA version). If you install the CPU-only version of your framework, it won’t be able to utilize the GPU even if it’s present.
The next hurdle you’ll likely encounter is managing dependencies for different CUDA versions or optimizing your model’s performance for specific GPU architectures.