Implementation:Volcengine Verl RolloutConfig
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Inference, Configuration |
| Type | Wrapper Doc (configures external vLLM/SGLang engines) |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Configuration dataclass that controls rollout generation behavior for vLLM and SGLang inference engines within the verl training loop.
Description
The RolloutConfig dataclass defines all parameters governing how rollout sequences are generated during RLHF training. It wraps configuration for external inference engines (vLLM, SGLang, TensorRT-LLM) and controls sampling parameters (temperature, top-k, top-p), resource allocation (GPU memory utilization, tensor parallelism), sequence lengths, log-probability computation, and multi-turn conversation settings. The config also manages engine lifecycle features such as sleep mode, chunked prefill, prefix caching, and checkpoint weight loading.
Usage
This config is instantiated as part of the actor_rollout_ref.rollout section of the Hydra/OmegaConf configuration. It is passed to the rollout worker which initializes the inference engine accordingly. Only async mode is supported; the previously available sync mode has been removed.
Code Reference
Source Location
- Repository: verl
- File: verl/workers/config/rollout.py
- Lines: 136-267
Signature
@dataclass
class RolloutConfig(BaseConfig):
_mutable_fields = {"max_model_len", "load_format"}
name: Optional[str] = MISSING
mode: str = "async"
temperature: float = 1.0
top_k: int = -1
top_p: float = 1.0
do_sample: bool = True
n: int = 1
repetition_penalty: float = 1.0
over_sample_rate: float = 0.0
prompt_length: int = 512
response_length: int = 512
dtype: str = "bfloat16"
gpu_memory_utilization: float = 0.5
ignore_eos: bool = False
enforce_eager: bool = True
free_cache_engine: bool = True
data_parallel_size: int = 1
expert_parallel_size: int = 1
tensor_model_parallel_size: int = 2
pipeline_model_parallel_size: int = 1
max_num_batched_tokens: int = 8192
val_kwargs: SamplingConfig = field(default_factory=SamplingConfig)
max_model_len: Optional[int] = None
max_num_seqs: int = 1024
log_prob_micro_batch_size: Optional[int] = None
log_prob_micro_batch_size_per_gpu: Optional[int] = None
log_prob_use_dynamic_bsz: bool = False
log_prob_max_token_len_per_gpu: int = 16384
multi_turn: MultiTurnConfig = field(default_factory=MultiTurnConfig)
server: ServerConfig = field(default_factory=ServerConfig)
checkpoint_engine: CheckpointEngineConfig = field(default_factory=CheckpointEngineConfig)
enable_chunked_prefill: bool = True
enable_prefix_caching: bool = True
enable_sleep_mode: bool = True
load_format: str = "dummy"
quantization: Optional[str] = None
# ... additional fields omitted for brevity
Import
from verl.workers.config.rollout import RolloutConfig
I/O Contract
Inputs (Key Configuration Fields)
| Name | Type | Required | Description |
|---|---|---|---|
| name | Optional[str] | Yes | Engine name: "vllm", "sglang", or "trtllm" |
| temperature | float | No | Sampling temperature (default: 1.0) |
| n | int | No | Number of responses to generate per prompt (default: 1) |
| gpu_memory_utilization | float | No | Fraction of GPU memory for KV cache (default: 0.5) |
| tensor_model_parallel_size | int | No | Tensor parallelism degree (default: 2) |
| prompt_length | int | No | Maximum prompt length in tokens (default: 512) |
| response_length | int | No | Maximum response length in tokens (default: 512) |
| top_k | int | No | Top-k sampling parameter; -1 disables (default: -1) |
| top_p | float | No | Top-p (nucleus) sampling parameter (default: 1.0) |
| multi_turn | MultiTurnConfig | No | Multi-turn conversation configuration |
| enable_sleep_mode | bool | No | Whether engine supports sleep/wake for memory sharing (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| (config object) | RolloutConfig | A validated configuration dataclass instance passed to the rollout worker |
Usage Examples
from omegaconf import OmegaConf
from verl.workers.config.rollout import RolloutConfig
# Typical YAML configuration (converted to OmegaConf)
rollout_cfg = OmegaConf.structured(RolloutConfig(
name="sglang",
temperature=0.7,
top_p=0.95,
n=4, # 4 responses per prompt for GRPO
gpu_memory_utilization=0.6,
tensor_model_parallel_size=2,
prompt_length=1024,
response_length=2048,
enable_chunked_prefill=True,
enable_prefix_caching=True,
enable_sleep_mode=True,
))
# In a Hydra config YAML file:
# actor_rollout_ref:
# rollout:
# name: sglang
# temperature: 0.7
# n: 4
# gpu_memory_utilization: 0.6
# tensor_model_parallel_size: 2
# response_length: 2048
Related Pages
Implements Principle
Environment Requirements
- Environment:Volcengine_Verl_vLLM_Rollout_Environment
- Environment:Volcengine_Verl_SGLang_Rollout_Environment
- Environment:Volcengine_Verl_Megatron_Core_Environment