Implementation:Alibaba ROLL AgenticConfig
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Configuration, Agentic_AI |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete configuration dataclass for agentic RL training pipelines provided by the Alibaba ROLL library.
Description
The AgenticConfig class extends PPOConfig with environment-specific settings for multi-turn RL training. It manages environment manager configurations (trajectory vs step-level), multi-level reward weighting (episode vs step), ratio type selection (token vs segment), and batch adjustment modes. The post-init validation ensures rollout batch sizes are compatible with group sizes and that generating arguments are consistent.
Usage
Import and instantiate this class when configuring an agentic RL pipeline. Typically loaded from YAML via Hydra.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/agentic/agentic_config.py
- Lines: L186-320
Signature
@dataclass
class AgenticConfig(PPOConfig):
"""
Configuration for agentic RL training pipeline.
Key Attributes:
train_env_manager: EnvManagerConfig - training environment config
val_env_manager: EnvManagerConfig - validation environment config
episode_reward_weight: float - weight for episode-level rewards
step_reward_weight: float - weight for step-level rewards
step_reward_gamma: float - discount factor for step rewards
ratio_type: Literal["token", "segment"] - PPO ratio computation type
adv_estimator: str - advantage estimator (grpo/gigpo/gae/step_reinforce/agentic_reinforce)
batch_adjust_mode: Literal["copy", "delete", "auto", "random_sample"]
custom_envs: Dict[str, Any] - environment configurations
"""
def __post_init__(self):
"""Validates environment configs, batch sizes, and worker assignments."""
Import
from roll.pipeline.agentic.agentic_config import AgenticConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| YAML config file | str (path) | Yes | Hydra-managed YAML with environment definitions |
| Environment YAMLs | str (paths) | Yes | traj_envs.yaml or step_envs.yaml defining environments |
Outputs
| Name | Type | Description |
|---|---|---|
| AgenticConfig instance | AgenticConfig | Validated config with environment, reward, and training settings |
| Worker configs | WorkerConfig | Nested configs for actor_train, actor_infer, reference, critic, reward workers |
Usage Examples
Loading Agentic Configuration
from hydra import compose, initialize
from omegaconf import OmegaConf
import dacite
# Load YAML configuration
initialize(config_path="examples/qwen2.5-0.5B-agentic")
cfg = compose(config_name="agent_val_frozen_lake")
# Convert to AgenticConfig
config_dict = OmegaConf.to_container(cfg, resolve=True)
agentic_config = dacite.from_dict(data_class=AgenticConfig, data=config_dict)
# Access key parameters
print(agentic_config.adv_estimator) # "gigpo"
print(agentic_config.episode_reward_weight) # 0.5
print(agentic_config.step_reward_weight) # 0.5
print(agentic_config.ratio_type) # "segment"
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
This implementation uses the following heuristics: