Principle:Alibaba ROLL Preference Dataset Preparation

Knowledge Sources	DPO Alibaba ROLL
Domains	Data_Processing, Alignment
Last Updated	2026-02-07 20:00 GMT

Overview

A data preprocessing principle for preparing chosen/rejected response pairs with interleaved batching for preference optimization training.

Description

Preference Dataset Preparation converts raw preference datasets (JSON with chosen/rejected response pairs) into tokenized, interleaved batches. Each batch contains pairs of chosen and rejected responses with matching prompts, formatted as (2*B, seq_len) tensors where chosen and rejected sequences are interleaved.

Usage

Use when preparing data for DPO or similar preference-based training methods.

Theoretical Basis

The interleaved format enables efficient loss computation:

Batch rows 0, 2, 4, ... contain chosen responses
Batch rows 1, 3, 5, ... contain rejected responses
Each pair shares prompt_id_lens for consistent comparison

Related Pages

Implemented By

Implementation:Alibaba_ROLL_DPO_Get_Encode_Function

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment