Implementation:Volcengine Verl RolloutConfig

Knowledge Sources	verl
Domains	Reinforcement_Learning, Inference, Configuration
Type	Wrapper Doc (configures external vLLM/SGLang engines)
Last Updated	2026-02-07 14:00 GMT

Overview

Configuration dataclass that controls rollout generation behavior for vLLM and SGLang inference engines within the verl training loop.

Description

The RolloutConfig dataclass defines all parameters governing how rollout sequences are generated during RLHF training. It wraps configuration for external inference engines (vLLM, SGLang, TensorRT-LLM) and controls sampling parameters (temperature, top-k, top-p), resource allocation (GPU memory utilization, tensor parallelism), sequence lengths, log-probability computation, and multi-turn conversation settings. The config also manages engine lifecycle features such as sleep mode, chunked prefill, prefix caching, and checkpoint weight loading.

Usage

This config is instantiated as part of the actor_rollout_ref.rollout section of the Hydra/OmegaConf configuration. It is passed to the rollout worker which initializes the inference engine accordingly. Only async mode is supported; the previously available sync mode has been removed.

Code Reference

Source Location

Repository: verl
File: verl/workers/config/rollout.py
Lines: 136-267

Signature

@dataclass
class RolloutConfig(BaseConfig):
    _mutable_fields = {"max_model_len", "load_format"}

    name: Optional[str] = MISSING
    mode: str = "async"

    temperature: float = 1.0
    top_k: int = -1
    top_p: float = 1.0
    do_sample: bool = True
    n: int = 1
    repetition_penalty: float = 1.0

    over_sample_rate: float = 0.0

    prompt_length: int = 512
    response_length: int = 512

    dtype: str = "bfloat16"
    gpu_memory_utilization: float = 0.5
    ignore_eos: bool = False
    enforce_eager: bool = True
    free_cache_engine: bool = True
    data_parallel_size: int = 1
    expert_parallel_size: int = 1
    tensor_model_parallel_size: int = 2
    pipeline_model_parallel_size: int = 1
    max_num_batched_tokens: int = 8192

    val_kwargs: SamplingConfig = field(default_factory=SamplingConfig)
    max_model_len: Optional[int] = None
    max_num_seqs: int = 1024

    log_prob_micro_batch_size: Optional[int] = None
    log_prob_micro_batch_size_per_gpu: Optional[int] = None
    log_prob_use_dynamic_bsz: bool = False
    log_prob_max_token_len_per_gpu: int = 16384

    multi_turn: MultiTurnConfig = field(default_factory=MultiTurnConfig)
    server: ServerConfig = field(default_factory=ServerConfig)
    checkpoint_engine: CheckpointEngineConfig = field(default_factory=CheckpointEngineConfig)

    enable_chunked_prefill: bool = True
    enable_prefix_caching: bool = True
    enable_sleep_mode: bool = True
    load_format: str = "dummy"
    quantization: Optional[str] = None

    # ... additional fields omitted for brevity

Import

from verl.workers.config.rollout import RolloutConfig

I/O Contract

Inputs (Key Configuration Fields)

Name	Type	Required	Description
name	Optional[str]	Yes	Engine name: "vllm", "sglang", or "trtllm"
temperature	float	No	Sampling temperature (default: 1.0)
n	int	No	Number of responses to generate per prompt (default: 1)
gpu_memory_utilization	float	No	Fraction of GPU memory for KV cache (default: 0.5)
tensor_model_parallel_size	int	No	Tensor parallelism degree (default: 2)
prompt_length	int	No	Maximum prompt length in tokens (default: 512)
response_length	int	No	Maximum response length in tokens (default: 512)
top_k	int	No	Top-k sampling parameter; -1 disables (default: -1)
top_p	float	No	Top-p (nucleus) sampling parameter (default: 1.0)
multi_turn	MultiTurnConfig	No	Multi-turn conversation configuration
enable_sleep_mode	bool	No	Whether engine supports sleep/wake for memory sharing (default: True)

Outputs

Name	Type	Description
(config object)	RolloutConfig	A validated configuration dataclass instance passed to the rollout worker

Usage Examples

from omegaconf import OmegaConf
from verl.workers.config.rollout import RolloutConfig

# Typical YAML configuration (converted to OmegaConf)
rollout_cfg = OmegaConf.structured(RolloutConfig(
    name="sglang",
    temperature=0.7,
    top_p=0.95,
    n=4,                           # 4 responses per prompt for GRPO
    gpu_memory_utilization=0.6,
    tensor_model_parallel_size=2,
    prompt_length=1024,
    response_length=2048,
    enable_chunked_prefill=True,
    enable_prefix_caching=True,
    enable_sleep_mode=True,
))

# In a Hydra config YAML file:
# actor_rollout_ref:
#   rollout:
#     name: sglang
#     temperature: 0.7
#     n: 4
#     gpu_memory_utilization: 0.6
#     tensor_model_parallel_size: 2
#     response_length: 2048

Related Pages

Implements Principle

Principle:Volcengine_Verl_Rollout_Generation

Environment Requirements

Heuristics Used

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment