Implementation:Alibaba ROLL DPOPipeline Val
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Evaluation |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete DPO validation method provided by the Alibaba ROLL library.
Description
The DPOPipeline.val method iterates over the validation dataloader, computes DPO loss and preference accuracy without gradients, and returns averaged metrics.
Usage
Called by the DPO pipeline at configured evaluation intervals.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/dpo/dpo_pipeline.py
- Lines: L245-296
Signature
class DPOPipeline(BasePipeline):
@torch.no_grad()
def val(self) -> Dict[str, float]:
"""
Validation for DPO pipeline.
Returns:
Dict with val/actor/loss, val/actor/acc,
val/actor/chosen_reward, val/actor/reject_reward
"""
Import
from roll.pipeline.dpo.dpo_pipeline import DPOPipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| val_dataloader | DataLoader | Yes | Validation preference data (from pipeline state) |
Outputs
| Name | Type | Description |
|---|---|---|
| metrics | Dict[str, float] | val/actor/loss, val/actor/acc, val/actor/chosen_reward, val/actor/reject_reward |
Usage Examples
if step % eval_steps == 0:
val_metrics = pipeline.val()
print(val_metrics) # {"val/actor/loss": 0.45, "val/actor/acc": 0.72}
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
No specific heuristics apply to this implementation.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment