Principle:Alibaba ROLL Preference Dataset Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Alignment |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A data preprocessing principle for preparing chosen/rejected response pairs with interleaved batching for preference optimization training.
Description
Preference Dataset Preparation converts raw preference datasets (JSON with chosen/rejected response pairs) into tokenized, interleaved batches. Each batch contains pairs of chosen and rejected responses with matching prompts, formatted as (2*B, seq_len) tensors where chosen and rejected sequences are interleaved.
Usage
Use when preparing data for DPO or similar preference-based training methods.
Theoretical Basis
The interleaved format enables efficient loss computation:
- Batch rows 0, 2, 4, ... contain chosen responses
- Batch rows 1, 3, 5, ... contain rejected responses
- Each pair shares prompt_id_lens for consistent comparison
Related Pages
Implemented By
Related Heuristics
No specific heuristics inform this principle.