Environment:Volcengine Verl CUDA GPU Environment

Metadata

Field	Value
Sources	verl\|https://github.com/volcengine/verl
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-07 17:00 GMT

Overview

Linux environment with NVIDIA CUDA GPU or Huawei Ascend NPU for RL training of LLMs.

Description

verl supports NVIDIA CUDA GPUs and Huawei Ascend NPUs. Device detection is automated via torch.cuda.is_available() and torch.npu.is_available(). For CUDA, compute capability detection uses torch.cuda.get_device_capability(). For Ascend NPU, IPC support requires Software >= 25.3.rc1 and CANN >= 8.3.rc1.

Usage

Required for all training workflows (GRPO, PPO, SFT, multi-turn). CPU fallback exists but is not practical for LLM training.

System Requirements

Component	Requirement
OS	Linux (Ubuntu recommended)
Hardware	NVIDIA GPU (A100/H100 preferred, min 16GB VRAM) or Huawei Ascend NPU
Disk	50GB+ SSD for model checkpoints

Dependencies

torch (with CUDA)
torch_npu (for Ascend)
packaging

Credentials

CUDA_VISIBLE_DEVICES: Device selection for CUDA GPUs
ASCEND_RT_VISIBLE_DEVICES: Ascend device selection

Quick Install

pip install torch

Code Evidence

From verl/utils/device.py:

is_cuda_available = torch.cuda.is_available()
is_npu_available = is_torch_npu_available()

And the device name function:

def get_device_name() -> str:
    if is_cuda_available:
        device = "cuda"
    elif is_npu_available:
        device = "npu"
    else:
        device = "cpu"
    return device

And IPC version check from verl/utils/device.py:187-296:

if version.parse(software_base) >= version.parse("25.3.rc1"):
    if version.parse(cann_base) >= version.parse("8.3.rc1"):
        return True

Common Errors

Error	Solution
"CUDA not available"	Install CUDA toolkit + GPU driver
"IPC not supported on your devices"	Update Ascend Software >= 25.3.rc1 and CANN >= 8.3.rc1
"No GPU/NPU detected"	Check device drivers

Compatibility Notes

NVIDIA GPUs with compute capability >= 7.0 recommended. Ascend NPU requires torch_npu. CPU mode available but impractical for LLM training.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment