Implementation:Alibaba ROLL AgenticConfig

Knowledge Sources	Alibaba ROLL
Domains	Reinforcement_Learning, Configuration, Agentic_AI
Last Updated	2026-02-07 20:00 GMT

Overview

Concrete configuration dataclass for agentic RL training pipelines provided by the Alibaba ROLL library.

Description

The AgenticConfig class extends PPOConfig with environment-specific settings for multi-turn RL training. It manages environment manager configurations (trajectory vs step-level), multi-level reward weighting (episode vs step), ratio type selection (token vs segment), and batch adjustment modes. The post-init validation ensures rollout batch sizes are compatible with group sizes and that generating arguments are consistent.

Usage

Import and instantiate this class when configuring an agentic RL pipeline. Typically loaded from YAML via Hydra.

Code Reference

Source Location

Repository: Alibaba ROLL
File: roll/pipeline/agentic/agentic_config.py
Lines: L186-320

Signature

@dataclass
class AgenticConfig(PPOConfig):
    """
    Configuration for agentic RL training pipeline.

    Key Attributes:
        train_env_manager: EnvManagerConfig - training environment config
        val_env_manager: EnvManagerConfig - validation environment config
        episode_reward_weight: float - weight for episode-level rewards
        step_reward_weight: float - weight for step-level rewards
        step_reward_gamma: float - discount factor for step rewards
        ratio_type: Literal["token", "segment"] - PPO ratio computation type
        adv_estimator: str - advantage estimator (grpo/gigpo/gae/step_reinforce/agentic_reinforce)
        batch_adjust_mode: Literal["copy", "delete", "auto", "random_sample"]
        custom_envs: Dict[str, Any] - environment configurations
    """
    def __post_init__(self):
        """Validates environment configs, batch sizes, and worker assignments."""

Import

from roll.pipeline.agentic.agentic_config import AgenticConfig

I/O Contract

Inputs

Name	Type	Required	Description
YAML config file	str (path)	Yes	Hydra-managed YAML with environment definitions
Environment YAMLs	str (paths)	Yes	traj_envs.yaml or step_envs.yaml defining environments

Outputs

Name	Type	Description
AgenticConfig instance	AgenticConfig	Validated config with environment, reward, and training settings
Worker configs	WorkerConfig	Nested configs for actor_train, actor_infer, reference, critic, reward workers

Usage Examples

Loading Agentic Configuration

from hydra import compose, initialize
from omegaconf import OmegaConf
import dacite

# Load YAML configuration
initialize(config_path="examples/qwen2.5-0.5B-agentic")
cfg = compose(config_name="agent_val_frozen_lake")

# Convert to AgenticConfig
config_dict = OmegaConf.to_container(cfg, resolve=True)
agentic_config = dacite.from_dict(data_class=AgenticConfig, data=config_dict)

# Access key parameters
print(agentic_config.adv_estimator)           # "gigpo"
print(agentic_config.episode_reward_weight)    # 0.5
print(agentic_config.step_reward_weight)       # 0.5
print(agentic_config.ratio_type)               # "segment"

Related Pages

Implements Principle

Principle:Alibaba_ROLL_Agentic_RL_Configuration

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Environment:Alibaba_ROLL_Python_Runtime_Environment

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment