Principle:Hpcaitech ColossalAI Preference Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Data_Engineering |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A data engineering process that converts human preference pairs (chosen/rejected responses) into tokenized datasets suitable for preference-based alignment training.
Description
Preference Data Preparation handles the unique data format required by DPO, KTO, and other preference-based alignment methods. Unlike SFT data (which has a single response per prompt), preference data contains paired responses: a chosen (preferred) response and a rejected (dispreferred) response for each prompt. Both responses must be tokenized independently with their own loss masks, creating parallel sequences that the training algorithm can compare.
Usage
Use this principle when preparing data for DPO, SimPO, or ORPO alignment training. The data must contain explicit preference pairs with chosen and rejected responses for each prompt.
Theoretical Basis
The preparation transforms preference pairs into parallel tokenized sequences:
- Parse each sample to extract prompt, chosen response, and rejected response
- Apply conversation template to both (prompt + chosen) and (prompt + rejected)
- Tokenize both sequences independently
- Generate separate loss masks for chosen and rejected sequences
- Save as Arrow dataset with fields: chosen_input_ids, chosen_loss_mask, rejected_input_ids, rejected_loss_mask