Environment:SeldonIO Seldon core GPU Inference Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU, Deep_Learning |
| Last Updated | 2026-02-13 14:00 GMT |
Overview
NVIDIA GPU environment with Triton Inference Server and NVIDIA Container Toolkit for serving GPU-accelerated models (TensorFlow, PyTorch, ONNX, TensorRT) on Seldon Core 2.
Description
This environment extends the base Seldon Core 2 deployment with NVIDIA GPU support. It uses the NVIDIA Triton Inference Server (`nvcr.io/nvidia/tritonserver:23.03-py3`) as the inference backend and requires the NVIDIA Container Toolkit (nvidia-docker2) for GPU passthrough to containers. On Kubernetes, GPU nodes must have the NVIDIA device plugin installed and models request GPU resources via `nvidia.com/gpu` resource limits. On Docker Compose, GPU support is enabled via the `all-gpu.yaml` overlay with `GPU_ENABLED=1`.
Usage
Use this environment when serving GPU-accelerated models such as TensorFlow SavedModels, PyTorch TorchScript, ONNX models, or TensorRT engines. It is required for CIFAR-10 image classification, large HuggingFace models, and any model that benefits from GPU inference.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | NVIDIA drivers require Linux; Windows via WSL2 possible |
| GPU | NVIDIA GPU | Compute capability varies by model; check Triton docs |
| VRAM | 3Gi minimum per GPU | From triton-gpu.yaml resource definition |
| Driver | NVIDIA Driver 525+ | Compatible with CUDA in Triton 23.03 |
| Container Runtime | NVIDIA Container Toolkit | nvidia-docker2 or nvidia-container-toolkit package |
Dependencies
System Packages
- `nvidia-driver` (525+ recommended)
- `nvidia-container-toolkit` (or `nvidia-docker2`)
Container Images
- `nvcr.io/nvidia/tritonserver:23.03-py3` (~11GB, full Triton with Python backend)
Kubernetes (Optional)
- NVIDIA Device Plugin DaemonSet (for GPU resource advertising)
- Node labels for GPU node selection (`nvidia.com/gpu` resource)
Credentials
No additional credentials required beyond the base environment.
For NGC private model downloads:
- `NGC_API_KEY`: NVIDIA NGC API key (optional, for gated models)
Quick Install
# Install NVIDIA Container Toolkit (Ubuntu)
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# Start Seldon Core 2 with GPU (Docker Compose)
cd scheduler
GPU_ENABLED=1 make deploy-local-triton
Code Evidence
GPU Docker Compose overlay from `scheduler/all-gpu.yaml`:
services:
triton:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
mlserver:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Kubernetes GPU server definition from `samples/servers/triton-gpu.yaml`:
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: triton-gpu
spec:
serverConfig: triton
extraCapabilities:
- gpu
podSpec:
containers:
- name: triton
resources:
limits:
nvidia.com/gpu: 1
requests:
memory: 3Gi
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `nvidia-smi: command not found` | NVIDIA driver not installed | Install NVIDIA driver: `sudo apt install nvidia-driver-525` |
| `could not select device driver "nvidia"` | NVIDIA Container Toolkit not installed | Install nvidia-container-toolkit and restart Docker |
| `CUDA out of memory` | Insufficient GPU VRAM | Use a GPU with more VRAM or reduce model size; check with `nvidia-smi` |
| `Triton failed to load model` | Model format incompatible with Triton | Verify model directory structure follows Triton conventions (version subdirectory) |
Compatibility Notes
- Triton 23.03: Supports TensorFlow 2.x, PyTorch 2.x, ONNX Runtime, TensorRT, Python backend, DALI, FIL, OpenVINO.
- MLServer GPU: MLServer 1.7.1 also supports GPU for HuggingFace transformers and PyTorch models.
- GPU scheduling: Models requiring GPU should set `requirements: [gpu]` in the Model CRD to ensure scheduling to GPU-enabled servers.
- Multi-GPU: Increase `nvidia.com/gpu` limit for multi-GPU models; Triton supports model parallelism.