Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL AgenticConfig

From Leeroopedia


Knowledge Sources
Domains Reinforcement_Learning, Configuration, Agentic_AI
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete configuration dataclass for agentic RL training pipelines provided by the Alibaba ROLL library.

Description

The AgenticConfig class extends PPOConfig with environment-specific settings for multi-turn RL training. It manages environment manager configurations (trajectory vs step-level), multi-level reward weighting (episode vs step), ratio type selection (token vs segment), and batch adjustment modes. The post-init validation ensures rollout batch sizes are compatible with group sizes and that generating arguments are consistent.

Usage

Import and instantiate this class when configuring an agentic RL pipeline. Typically loaded from YAML via Hydra.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/agentic/agentic_config.py
  • Lines: L186-320

Signature

@dataclass
class AgenticConfig(PPOConfig):
    """
    Configuration for agentic RL training pipeline.

    Key Attributes:
        train_env_manager: EnvManagerConfig - training environment config
        val_env_manager: EnvManagerConfig - validation environment config
        episode_reward_weight: float - weight for episode-level rewards
        step_reward_weight: float - weight for step-level rewards
        step_reward_gamma: float - discount factor for step rewards
        ratio_type: Literal["token", "segment"] - PPO ratio computation type
        adv_estimator: str - advantage estimator (grpo/gigpo/gae/step_reinforce/agentic_reinforce)
        batch_adjust_mode: Literal["copy", "delete", "auto", "random_sample"]
        custom_envs: Dict[str, Any] - environment configurations
    """
    def __post_init__(self):
        """Validates environment configs, batch sizes, and worker assignments."""

Import

from roll.pipeline.agentic.agentic_config import AgenticConfig

I/O Contract

Inputs

Name Type Required Description
YAML config file str (path) Yes Hydra-managed YAML with environment definitions
Environment YAMLs str (paths) Yes traj_envs.yaml or step_envs.yaml defining environments

Outputs

Name Type Description
AgenticConfig instance AgenticConfig Validated config with environment, reward, and training settings
Worker configs WorkerConfig Nested configs for actor_train, actor_infer, reference, critic, reward workers

Usage Examples

Loading Agentic Configuration

from hydra import compose, initialize
from omegaconf import OmegaConf
import dacite

# Load YAML configuration
initialize(config_path="examples/qwen2.5-0.5B-agentic")
cfg = compose(config_name="agent_val_frozen_lake")

# Convert to AgenticConfig
config_dict = OmegaConf.to_container(cfg, resolve=True)
agentic_config = dacite.from_dict(data_class=AgenticConfig, data=config_dict)

# Access key parameters
print(agentic_config.adv_estimator)           # "gigpo"
print(agentic_config.episode_reward_weight)    # 0.5
print(agentic_config.step_reward_weight)       # 0.5
print(agentic_config.ratio_type)               # "segment"

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment