Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL Preference Dataset Preparation

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Alignment
Last Updated 2026-02-07 20:00 GMT

Overview

A data preprocessing principle for preparing chosen/rejected response pairs with interleaved batching for preference optimization training.

Description

Preference Dataset Preparation converts raw preference datasets (JSON with chosen/rejected response pairs) into tokenized, interleaved batches. Each batch contains pairs of chosen and rejected responses with matching prompts, formatted as (2*B, seq_len) tensors where chosen and rejected sequences are interleaved.

Usage

Use when preparing data for DPO or similar preference-based training methods.

Theoretical Basis

The interleaved format enables efficient loss computation:

  • Batch rows 0, 2, 4, ... contain chosen responses
  • Batch rows 1, 3, 5, ... contain rejected responses
  • Each pair shares prompt_id_lens for consistent comparison

Related Pages

Implemented By

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment