Implementation:Alibaba ROLL DPOPipeline Val

Knowledge Sources	Alibaba ROLL
Domains	Alignment, Evaluation
Last Updated	2026-02-07 20:00 GMT

Overview

Concrete DPO validation method provided by the Alibaba ROLL library.

Description

The DPOPipeline.val method iterates over the validation dataloader, computes DPO loss and preference accuracy without gradients, and returns averaged metrics.

Usage

Called by the DPO pipeline at configured evaluation intervals.

Code Reference

Source Location

Repository: Alibaba ROLL
File: roll/pipeline/dpo/dpo_pipeline.py
Lines: L245-296

Signature

class DPOPipeline(BasePipeline):
    @torch.no_grad()
    def val(self) -> Dict[str, float]:
        """
        Validation for DPO pipeline.

        Returns:
            Dict with val/actor/loss, val/actor/acc,
            val/actor/chosen_reward, val/actor/reject_reward
        """

Import

from roll.pipeline.dpo.dpo_pipeline import DPOPipeline

I/O Contract

Inputs

Name	Type	Required	Description
val_dataloader	DataLoader	Yes	Validation preference data (from pipeline state)

Outputs

Name	Type	Description
metrics	Dict[str, float]	val/actor/loss, val/actor/acc, val/actor/chosen_reward, val/actor/reject_reward

Usage Examples

if step % eval_steps == 0:
    val_metrics = pipeline.val()
    print(val_metrics)  # {"val/actor/loss": 0.45, "val/actor/acc": 0.72}

Related Pages

Implements Principle

Principle:Alibaba_ROLL_DPO_Validation

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Environment:Alibaba_ROLL_Python_Runtime_Environment

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment