Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Volcengine Verl RolloutConfig

From Leeroopedia


Knowledge Sources
Domains Reinforcement_Learning, Inference, Configuration
Type Wrapper Doc (configures external vLLM/SGLang engines)
Last Updated 2026-02-07 14:00 GMT

Overview

Configuration dataclass that controls rollout generation behavior for vLLM and SGLang inference engines within the verl training loop.

Description

The RolloutConfig dataclass defines all parameters governing how rollout sequences are generated during RLHF training. It wraps configuration for external inference engines (vLLM, SGLang, TensorRT-LLM) and controls sampling parameters (temperature, top-k, top-p), resource allocation (GPU memory utilization, tensor parallelism), sequence lengths, log-probability computation, and multi-turn conversation settings. The config also manages engine lifecycle features such as sleep mode, chunked prefill, prefix caching, and checkpoint weight loading.

Usage

This config is instantiated as part of the actor_rollout_ref.rollout section of the Hydra/OmegaConf configuration. It is passed to the rollout worker which initializes the inference engine accordingly. Only async mode is supported; the previously available sync mode has been removed.

Code Reference

Source Location

  • Repository: verl
  • File: verl/workers/config/rollout.py
  • Lines: 136-267

Signature

@dataclass
class RolloutConfig(BaseConfig):
    _mutable_fields = {"max_model_len", "load_format"}

    name: Optional[str] = MISSING
    mode: str = "async"

    temperature: float = 1.0
    top_k: int = -1
    top_p: float = 1.0
    do_sample: bool = True
    n: int = 1
    repetition_penalty: float = 1.0

    over_sample_rate: float = 0.0

    prompt_length: int = 512
    response_length: int = 512

    dtype: str = "bfloat16"
    gpu_memory_utilization: float = 0.5
    ignore_eos: bool = False
    enforce_eager: bool = True
    free_cache_engine: bool = True
    data_parallel_size: int = 1
    expert_parallel_size: int = 1
    tensor_model_parallel_size: int = 2
    pipeline_model_parallel_size: int = 1
    max_num_batched_tokens: int = 8192

    val_kwargs: SamplingConfig = field(default_factory=SamplingConfig)
    max_model_len: Optional[int] = None
    max_num_seqs: int = 1024

    log_prob_micro_batch_size: Optional[int] = None
    log_prob_micro_batch_size_per_gpu: Optional[int] = None
    log_prob_use_dynamic_bsz: bool = False
    log_prob_max_token_len_per_gpu: int = 16384

    multi_turn: MultiTurnConfig = field(default_factory=MultiTurnConfig)
    server: ServerConfig = field(default_factory=ServerConfig)
    checkpoint_engine: CheckpointEngineConfig = field(default_factory=CheckpointEngineConfig)

    enable_chunked_prefill: bool = True
    enable_prefix_caching: bool = True
    enable_sleep_mode: bool = True
    load_format: str = "dummy"
    quantization: Optional[str] = None

    # ... additional fields omitted for brevity

Import

from verl.workers.config.rollout import RolloutConfig

I/O Contract

Inputs (Key Configuration Fields)

Name Type Required Description
name Optional[str] Yes Engine name: "vllm", "sglang", or "trtllm"
temperature float No Sampling temperature (default: 1.0)
n int No Number of responses to generate per prompt (default: 1)
gpu_memory_utilization float No Fraction of GPU memory for KV cache (default: 0.5)
tensor_model_parallel_size int No Tensor parallelism degree (default: 2)
prompt_length int No Maximum prompt length in tokens (default: 512)
response_length int No Maximum response length in tokens (default: 512)
top_k int No Top-k sampling parameter; -1 disables (default: -1)
top_p float No Top-p (nucleus) sampling parameter (default: 1.0)
multi_turn MultiTurnConfig No Multi-turn conversation configuration
enable_sleep_mode bool No Whether engine supports sleep/wake for memory sharing (default: True)

Outputs

Name Type Description
(config object) RolloutConfig A validated configuration dataclass instance passed to the rollout worker

Usage Examples

from omegaconf import OmegaConf
from verl.workers.config.rollout import RolloutConfig

# Typical YAML configuration (converted to OmegaConf)
rollout_cfg = OmegaConf.structured(RolloutConfig(
    name="sglang",
    temperature=0.7,
    top_p=0.95,
    n=4,                           # 4 responses per prompt for GRPO
    gpu_memory_utilization=0.6,
    tensor_model_parallel_size=2,
    prompt_length=1024,
    response_length=2048,
    enable_chunked_prefill=True,
    enable_prefix_caching=True,
    enable_sleep_mode=True,
))

# In a Hydra config YAML file:
# actor_rollout_ref:
#   rollout:
#     name: sglang
#     temperature: 0.7
#     n: 4
#     gpu_memory_utilization: 0.6
#     tensor_model_parallel_size: 2
#     response_length: 2048

Related Pages

Implements Principle

Environment Requirements

Heuristics Used

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment