Principle:Hpcaitech ColossalAI Preference Data Preparation

Knowledge Sources	ColossalAI Direct Preference Optimization
Domains	NLP, Data_Engineering
Last Updated	2026-02-09 00:00 GMT

Overview

A data engineering process that converts human preference pairs (chosen/rejected responses) into tokenized datasets suitable for preference-based alignment training.

Description

Preference Data Preparation handles the unique data format required by DPO, KTO, and other preference-based alignment methods. Unlike SFT data (which has a single response per prompt), preference data contains paired responses: a chosen (preferred) response and a rejected (dispreferred) response for each prompt. Both responses must be tokenized independently with their own loss masks, creating parallel sequences that the training algorithm can compare.

Usage

Use this principle when preparing data for DPO, SimPO, or ORPO alignment training. The data must contain explicit preference pairs with chosen and rejected responses for each prompt.

Theoretical Basis

The preparation transforms preference pairs into parallel tokenized sequences:

Parse each sample to extract prompt, chosen response, and rejected response
Apply conversation template to both (prompt + chosen) and (prompt + rejected)
Tokenize both sequences independently
Generate separate loss masks for chosen and rejected sequences
Save as Arrow dataset with fields: chosen_input_ids, chosen_loss_mask, rejected_input_ids, rejected_loss_mask

Related Pages

Implemented By

Implementation:Hpcaitech_ColossalAI_Prepare_Dataset_Preference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment