Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Bentoml BentoML NVIDIA GPU Resource

From Leeroopedia
Knowledge Sources
Domains Infrastructure, GPU_Computing
Last Updated 2026-02-13 16:00 GMT

Overview

NVIDIA GPU runtime environment using `pynvml` (nvidia-ml-py) for GPU detection, memory querying, and CUDA device assignment in BentoML runners and services.

Description

BentoML supports NVIDIA GPU acceleration for model serving through its resource management system. GPU detection is performed via the `pynvml` library (Python bindings for NVML), which queries the NVIDIA driver for device count and memory information. The `NvidiaGpuResource` class handles GPU device enumeration, validation, and assignment to workers via `CUDA_VISIBLE_DEVICES`. The resource system integrates with the runner strategy to automatically determine worker counts and GPU assignments. Supported GPU types for BentoCloud deployments include NVIDIA B200, GB200, H200, H100, A100, A10G, L4, T4, V100, P100, K80, P4, and RTX Pro 6000, as well as AMD MI300X, MI325X, MI355X GPUs.

Usage

Use this environment when running GPU-accelerated model inference through BentoML. Required for any service or runner that declares `"nvidia.com/gpu"` in its `SUPPORTED_RESOURCES` tuple, including frameworks like PyTorch, TensorFlow, ONNX Runtime, Transformers, Diffusers, CatBoost, XGBoost, Detectron2, EasyOCR, and Keras.

System Requirements

Category Requirement Notes
OS Linux or Windows GPU detection via NVML; not available on macOS
Hardware NVIDIA GPU Any GPU supported by the installed NVIDIA driver
Driver NVIDIA GPU Driver Must be loaded for pynvml to initialize
Library nvidia-ml-py (pynvml) Included in BentoML core dependencies

Dependencies

System Packages

  • NVIDIA GPU Driver (installed at OS level)
  • CUDA Toolkit (if required by specific ML frameworks)

Python Packages

  • `nvidia-ml-py` (included in BentoML core deps, provides `pynvml`)
  • Framework-specific packages: `torch`, `tensorflow`, `onnxruntime-gpu`, etc. (per model needs)

Credentials

No specific credentials required. The following environment variable is set by the runner strategy:

  • `CUDA_VISIBLE_DEVICES`: Automatically set by `DefaultStrategy.get_worker_env()` to control which GPU(s) a worker can access. Set to `"-1"` to disable GPU for CPU-only workers.

Quick Install

# nvidia-ml-py is already a core BentoML dependency
pip install bentoml

# Verify GPU detection
python -c "import pynvml; pynvml.nvmlInit(); print(f'GPUs: {pynvml.nvmlDeviceGetCount()}'); pynvml.nvmlShutdown()"

Code Evidence

GPU detection from `resource.py:240-257`:

class NvidiaGpuResource(Resource[t.List[int]], resource_id="nvidia.com/gpu"):
    @classmethod
    @functools.lru_cache(maxsize=1)
    def from_system(cls) -> list[int]:
        import pynvml
        try:
            pynvml.nvmlInit()
            device_count = pynvml.nvmlDeviceGetCount()
            return list(range(device_count))
        except (
            pynvml.NVMLError_LibraryNotFound,
            pynvml.NVMLError_DriverNotLoaded,
            OSError,
        ):
            logger.debug("GPU not detected. Unable to initialize pynvml lib.")
            return []

GPU memory query from `resource.py:274-306`:

def get_gpu_memory(dev: int) -> t.Tuple[float, float]:
    """Return Total Memory and Free Memory in given GPU device. in MiB"""
    import pynvml.nvml
    from pynvml.smi import nvidia_smi
    try:
        inst = nvidia_smi.getInstance()
        query = inst.DeviceQuery(dev)
    except (pynvml.nvml.NVMLError, OSError):
        return 0.0, 0.0

CUDA_VISIBLE_DEVICES assignment from `strategy.py:150-162`:

assigned_gpu = nvidia_gpus[
    assigned_resource_per_worker * worker_index :
    assigned_resource_per_worker * (worker_index + 1)
]
dev = ",".join(map(str, assigned_gpu))
environ["CUDA_VISIBLE_DEVICES"] = dev

Supported GPU types for cloud deployment from `service/config.py:20-38`:

GpuLiteralType = Literal[
    "nvidia-b200", "nvidia-gb200", "nvidia-rtx-pro-6000",
    "nvidia-h200-141gb", "nvidia-tesla-h100", "nvidia-tesla-t4",
    "nvidia-tesla-a100", "nvidia-a100-80gb", "nvidia-h100-80gb",
    "nvidia-a10g", "nvidia-l4", "nvidia-tesla-v100",
    "nvidia-tesla-p100", "nvidia-tesla-k80", "nvidia-tesla-p4",
    "amd-mi300x", "amd-mi325x", "amd-mi355x",
]

Common Errors

Error Message Cause Solution
`NVMLError_LibraryNotFound` NVIDIA driver not installed or pynvml cannot find NVML Install NVIDIA GPU driver; ensure `libnvidia-ml.so` is on the library path
`NVMLError_DriverNotLoaded` NVIDIA driver installed but not loaded Restart the machine or run `sudo modprobe nvidia`
`BentoMLConfigException: GPU device index in X is greater than system available` Requested GPU index exceeds physical GPU count Reduce GPU count in resource config or check `nvidia-smi` output
`IndexError: There aren't enough assigned GPU(s) for given worker id` `workers_per_resource` < 1.0 requires more GPUs than available Reduce `workers_per_resource` or add more GPUs

Compatibility Notes

  • macOS: NVIDIA GPUs are not supported on macOS. GPU detection will return an empty list.
  • CPU fallback: If no GPU is detected and the runnable supports CPU, BentoML falls back to CPU with a warning: "No known supported resource available for X, falling back to using CPU."
  • Fractional GPU: Setting `workers_per_resource` to a float like 0.5 assigns multiple GPUs per worker. Values > 1 are not supported in the default strategy.
  • CUDA_VISIBLE_DEVICES: Set to `"-1"` for CPU-only workers to ensure GPU is fully disabled.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment