Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:SeldonIO Seldon core GPU Inference Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, GPU, Deep_Learning
Last Updated 2026-02-13 14:00 GMT

Overview

NVIDIA GPU environment with Triton Inference Server and NVIDIA Container Toolkit for serving GPU-accelerated models (TensorFlow, PyTorch, ONNX, TensorRT) on Seldon Core 2.

Description

This environment extends the base Seldon Core 2 deployment with NVIDIA GPU support. It uses the NVIDIA Triton Inference Server (`nvcr.io/nvidia/tritonserver:23.03-py3`) as the inference backend and requires the NVIDIA Container Toolkit (nvidia-docker2) for GPU passthrough to containers. On Kubernetes, GPU nodes must have the NVIDIA device plugin installed and models request GPU resources via `nvidia.com/gpu` resource limits. On Docker Compose, GPU support is enabled via the `all-gpu.yaml` overlay with `GPU_ENABLED=1`.

Usage

Use this environment when serving GPU-accelerated models such as TensorFlow SavedModels, PyTorch TorchScript, ONNX models, or TensorRT engines. It is required for CIFAR-10 image classification, large HuggingFace models, and any model that benefits from GPU inference.

System Requirements

Category Requirement Notes
OS Linux NVIDIA drivers require Linux; Windows via WSL2 possible
GPU NVIDIA GPU Compute capability varies by model; check Triton docs
VRAM 3Gi minimum per GPU From triton-gpu.yaml resource definition
Driver NVIDIA Driver 525+ Compatible with CUDA in Triton 23.03
Container Runtime NVIDIA Container Toolkit nvidia-docker2 or nvidia-container-toolkit package

Dependencies

System Packages

  • `nvidia-driver` (525+ recommended)
  • `nvidia-container-toolkit` (or `nvidia-docker2`)

Container Images

  • `nvcr.io/nvidia/tritonserver:23.03-py3` (~11GB, full Triton with Python backend)

Kubernetes (Optional)

  • NVIDIA Device Plugin DaemonSet (for GPU resource advertising)
  • Node labels for GPU node selection (`nvidia.com/gpu` resource)

Credentials

No additional credentials required beyond the base environment.

For NGC private model downloads:

  • `NGC_API_KEY`: NVIDIA NGC API key (optional, for gated models)

Quick Install

# Install NVIDIA Container Toolkit (Ubuntu)
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Start Seldon Core 2 with GPU (Docker Compose)
cd scheduler
GPU_ENABLED=1 make deploy-local-triton

Code Evidence

GPU Docker Compose overlay from `scheduler/all-gpu.yaml`:

services:
  triton:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
  mlserver:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Kubernetes GPU server definition from `samples/servers/triton-gpu.yaml`:

apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: triton-gpu
spec:
  serverConfig: triton
  extraCapabilities:
    - gpu
  podSpec:
    containers:
      - name: triton
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            memory: 3Gi

Common Errors

Error Message Cause Solution
`nvidia-smi: command not found` NVIDIA driver not installed Install NVIDIA driver: `sudo apt install nvidia-driver-525`
`could not select device driver "nvidia"` NVIDIA Container Toolkit not installed Install nvidia-container-toolkit and restart Docker
`CUDA out of memory` Insufficient GPU VRAM Use a GPU with more VRAM or reduce model size; check with `nvidia-smi`
`Triton failed to load model` Model format incompatible with Triton Verify model directory structure follows Triton conventions (version subdirectory)

Compatibility Notes

  • Triton 23.03: Supports TensorFlow 2.x, PyTorch 2.x, ONNX Runtime, TensorRT, Python backend, DALI, FIL, OpenVINO.
  • MLServer GPU: MLServer 1.7.1 also supports GPU for HuggingFace transformers and PyTorch models.
  • GPU scheduling: Models requiring GPU should set `requirements: [gpu]` in the Model CRD to ensure scheduling to GPU-enabled servers.
  • Multi-GPU: Increase `nvidia.com/gpu` limit for multi-GPU models; Triton supports model parallelism.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment