Environment:Bentoml BentoML NVIDIA GPU Resource
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing |
| Last Updated | 2026-02-13 16:00 GMT |
Overview
NVIDIA GPU runtime environment using `pynvml` (nvidia-ml-py) for GPU detection, memory querying, and CUDA device assignment in BentoML runners and services.
Description
BentoML supports NVIDIA GPU acceleration for model serving through its resource management system. GPU detection is performed via the `pynvml` library (Python bindings for NVML), which queries the NVIDIA driver for device count and memory information. The `NvidiaGpuResource` class handles GPU device enumeration, validation, and assignment to workers via `CUDA_VISIBLE_DEVICES`. The resource system integrates with the runner strategy to automatically determine worker counts and GPU assignments. Supported GPU types for BentoCloud deployments include NVIDIA B200, GB200, H200, H100, A100, A10G, L4, T4, V100, P100, K80, P4, and RTX Pro 6000, as well as AMD MI300X, MI325X, MI355X GPUs.
Usage
Use this environment when running GPU-accelerated model inference through BentoML. Required for any service or runner that declares `"nvidia.com/gpu"` in its `SUPPORTED_RESOURCES` tuple, including frameworks like PyTorch, TensorFlow, ONNX Runtime, Transformers, Diffusers, CatBoost, XGBoost, Detectron2, EasyOCR, and Keras.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux or Windows | GPU detection via NVML; not available on macOS |
| Hardware | NVIDIA GPU | Any GPU supported by the installed NVIDIA driver |
| Driver | NVIDIA GPU Driver | Must be loaded for pynvml to initialize |
| Library | nvidia-ml-py (pynvml) | Included in BentoML core dependencies |
Dependencies
System Packages
- NVIDIA GPU Driver (installed at OS level)
- CUDA Toolkit (if required by specific ML frameworks)
Python Packages
- `nvidia-ml-py` (included in BentoML core deps, provides `pynvml`)
- Framework-specific packages: `torch`, `tensorflow`, `onnxruntime-gpu`, etc. (per model needs)
Credentials
No specific credentials required. The following environment variable is set by the runner strategy:
- `CUDA_VISIBLE_DEVICES`: Automatically set by `DefaultStrategy.get_worker_env()` to control which GPU(s) a worker can access. Set to `"-1"` to disable GPU for CPU-only workers.
Quick Install
# nvidia-ml-py is already a core BentoML dependency
pip install bentoml
# Verify GPU detection
python -c "import pynvml; pynvml.nvmlInit(); print(f'GPUs: {pynvml.nvmlDeviceGetCount()}'); pynvml.nvmlShutdown()"
Code Evidence
GPU detection from `resource.py:240-257`:
class NvidiaGpuResource(Resource[t.List[int]], resource_id="nvidia.com/gpu"):
@classmethod
@functools.lru_cache(maxsize=1)
def from_system(cls) -> list[int]:
import pynvml
try:
pynvml.nvmlInit()
device_count = pynvml.nvmlDeviceGetCount()
return list(range(device_count))
except (
pynvml.NVMLError_LibraryNotFound,
pynvml.NVMLError_DriverNotLoaded,
OSError,
):
logger.debug("GPU not detected. Unable to initialize pynvml lib.")
return []
GPU memory query from `resource.py:274-306`:
def get_gpu_memory(dev: int) -> t.Tuple[float, float]:
"""Return Total Memory and Free Memory in given GPU device. in MiB"""
import pynvml.nvml
from pynvml.smi import nvidia_smi
try:
inst = nvidia_smi.getInstance()
query = inst.DeviceQuery(dev)
except (pynvml.nvml.NVMLError, OSError):
return 0.0, 0.0
CUDA_VISIBLE_DEVICES assignment from `strategy.py:150-162`:
assigned_gpu = nvidia_gpus[
assigned_resource_per_worker * worker_index :
assigned_resource_per_worker * (worker_index + 1)
]
dev = ",".join(map(str, assigned_gpu))
environ["CUDA_VISIBLE_DEVICES"] = dev
Supported GPU types for cloud deployment from `service/config.py:20-38`:
GpuLiteralType = Literal[
"nvidia-b200", "nvidia-gb200", "nvidia-rtx-pro-6000",
"nvidia-h200-141gb", "nvidia-tesla-h100", "nvidia-tesla-t4",
"nvidia-tesla-a100", "nvidia-a100-80gb", "nvidia-h100-80gb",
"nvidia-a10g", "nvidia-l4", "nvidia-tesla-v100",
"nvidia-tesla-p100", "nvidia-tesla-k80", "nvidia-tesla-p4",
"amd-mi300x", "amd-mi325x", "amd-mi355x",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `NVMLError_LibraryNotFound` | NVIDIA driver not installed or pynvml cannot find NVML | Install NVIDIA GPU driver; ensure `libnvidia-ml.so` is on the library path |
| `NVMLError_DriverNotLoaded` | NVIDIA driver installed but not loaded | Restart the machine or run `sudo modprobe nvidia` |
| `BentoMLConfigException: GPU device index in X is greater than system available` | Requested GPU index exceeds physical GPU count | Reduce GPU count in resource config or check `nvidia-smi` output |
| `IndexError: There aren't enough assigned GPU(s) for given worker id` | `workers_per_resource` < 1.0 requires more GPUs than available | Reduce `workers_per_resource` or add more GPUs |
Compatibility Notes
- macOS: NVIDIA GPUs are not supported on macOS. GPU detection will return an empty list.
- CPU fallback: If no GPU is detected and the runnable supports CPU, BentoML falls back to CPU with a warning: "No known supported resource available for X, falling back to using CPU."
- Fractional GPU: Setting `workers_per_resource` to a float like 0.5 assigns multiple GPUs per worker. Values > 1 are not supported in the default strategy.
- CUDA_VISIBLE_DEVICES: Set to `"-1"` for CPU-only workers to ensure GPU is fully disabled.