Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Hpcaitech ColossalAI Preference Data Preparation

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Engineering
Last Updated 2026-02-09 00:00 GMT

Overview

A data engineering process that converts human preference pairs (chosen/rejected responses) into tokenized datasets suitable for preference-based alignment training.

Description

Preference Data Preparation handles the unique data format required by DPO, KTO, and other preference-based alignment methods. Unlike SFT data (which has a single response per prompt), preference data contains paired responses: a chosen (preferred) response and a rejected (dispreferred) response for each prompt. Both responses must be tokenized independently with their own loss masks, creating parallel sequences that the training algorithm can compare.

Usage

Use this principle when preparing data for DPO, SimPO, or ORPO alignment training. The data must contain explicit preference pairs with chosen and rejected responses for each prompt.

Theoretical Basis

The preparation transforms preference pairs into parallel tokenized sequences:

  1. Parse each sample to extract prompt, chosen response, and rejected response
  2. Apply conversation template to both (prompt + chosen) and (prompt + rejected)
  3. Tokenize both sequences independently
  4. Generate separate loss masks for chosen and rejected sequences
  5. Save as Arrow dataset with fields: chosen_input_ids, chosen_loss_mask, rejected_input_ids, rejected_loss_mask

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment