Implementation:Alibaba ROLL Agentic ActorWorker Loss Func
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Agentic_AI |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete agentic actor worker loss function with segment-level PPO support provided by the Alibaba ROLL library.
Description
The ActorWorker.loss_func method in the agentic pipeline computes the PPO loss with support for both token-level and segment-level ratio computation. It handles asymmetric clipping, KL penalty with reference model, entropy regularization, optional dual clipping, and train/infer log-probability correction.
Usage
Called by the training strategy (Megatron/DeepSpeed) during each forward-backward pass of policy optimization.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/agentic/agentic_actor_worker.py
- Lines: L10-148
Signature
class ActorWorker(BaseActorWorker):
def loss_func(
self,
data: DataProto,
output_tensor: torch.Tensor
) -> Tuple[torch.Tensor, Dict[str, float]]:
"""
Compute PPO loss for agentic policy optimization.
Args:
data: DataProto with response_mask, ref_log_probs, advantages,
input_ids, attention_mask, optionally infer_logprobs
output_tensor: Model logits output
Returns:
(total_loss, metrics_dict) where metrics include:
- actor/pg_loss, actor/kl_loss, actor/ppo_ratio_clipfrac
- actor/ratio_mean, actor/ratio_max, actor/ratio_min
- actor/approxkl, actor/policykl
"""
Import
from roll.pipeline.agentic.agentic_actor_worker import ActorWorker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | DataProto | Yes | Training batch with advantages, old_log_probs, ref_log_probs, response_mask |
| output_tensor | torch.Tensor | Yes | Model logits from forward pass |
Outputs
| Name | Type | Description |
|---|---|---|
| total_loss | torch.Tensor | Scalar loss for gradient computation |
| metrics | Dict[str, float] | Training metrics (pg_loss, kl_loss, clipfrac, ratio stats, approxkl) |
Usage Examples
# Called internally by the training strategy:
loss, metrics = actor_worker.loss_func(
data=training_batch,
output_tensor=model_logits
)
# metrics example:
# {"actor/pg_loss@sum": 0.05, "actor/kl_loss@sum": 0.01, "actor/ppo_ratio_clipfrac@sum": 0.12}
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
This implementation uses the following heuristics: