Environment:Volcengine Verl vLLM Rollout Environment
Metadata
| Field | Value |
|---|---|
| Sources | verl|https://github.com/volcengine/verl |
| Domains | Infrastructure, Inference |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
Optional vLLM environment for high-performance LLM rollout generation in verl RL training.
Description
verl uses vLLM as one of its rollout backends for generating model responses during RL training. vLLM >= 0.8.5 and <= 0.12.0 is required. For async server mode, vLLM >= 0.11.1 is required. The sleep_level mechanism manages memory between training and inference phases.
Usage
Required when using vLLM as the rollout engine (rollout.name=vllm in config).
System Requirements
- NVIDIA GPU with sufficient VRAM for KV cache
- Linux
Dependencies
- vllm >= 0.8.5, <= 0.12.0
- tensordict >= 0.8.0, <= 0.10.0, != 0.9.0
Credentials
None specific to vLLM
Quick Install
pip install "verl[vllm]"
Or install vLLM directly:
pip install "vllm>=0.8.5,<=0.12.0"
Code Evidence
From verl/utils/import_utils.py:36-42:
@cache
def is_vllm_available():
try:
vllm_spec = importlib.util.find_spec("vllm")
except ModuleNotFoundError:
vllm_spec = None
return vllm_spec is not None
And from verl/workers/rollout/vllm_rollout/vllm_async_server.py:746:
assert _VLLM_VERSION >= version.parse("0.11.1")
And sleep level from verl/workers/rollout/vllm_rollout/vllm_rollout.py:90-94:
if config.layered_summon or (config.expert_parallel_size > 1 and not _check_vllm_version_for_sleep_level()):
logger.warning("Setting the sleep level to 1 may cause a memory overflow.")
self.sleep_level = 1
Common Errors
| Error | Solution |
|---|---|
| "vllm not found" | pip install vllm |
| "vLLM version must be >= 0.11.1" for async mode | Upgrade vLLM |
| "CUDA out of memory" during rollout | Reduce gpu_memory_utilization config value |
Compatibility Notes
vLLM sleep_level=2 destroys base model weights during sleep, requiring resync from training weights. For expert parallelism on older vLLM, sleep_level falls back to 1 which may cause OOM.