Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPOPipeline Val

From Leeroopedia


Knowledge Sources
Domains Alignment, Evaluation
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete DPO validation method provided by the Alibaba ROLL library.

Description

The DPOPipeline.val method iterates over the validation dataloader, computes DPO loss and preference accuracy without gradients, and returns averaged metrics.

Usage

Called by the DPO pipeline at configured evaluation intervals.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/dpo/dpo_pipeline.py
  • Lines: L245-296

Signature

class DPOPipeline(BasePipeline):
    @torch.no_grad()
    def val(self) -> Dict[str, float]:
        """
        Validation for DPO pipeline.

        Returns:
            Dict with val/actor/loss, val/actor/acc,
            val/actor/chosen_reward, val/actor/reject_reward
        """

Import

from roll.pipeline.dpo.dpo_pipeline import DPOPipeline

I/O Contract

Inputs

Name Type Required Description
val_dataloader DataLoader Yes Validation preference data (from pipeline state)

Outputs

Name Type Description
metrics Dict[str, float] val/actor/loss, val/actor/acc, val/actor/chosen_reward, val/actor/reject_reward

Usage Examples

if step % eval_steps == 0:
    val_metrics = pipeline.val()
    print(val_metrics)  # {"val/actor/loss": 0.45, "val/actor/acc": 0.72}

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment