Principle:Hpcaitech ColossalAI Preference Dataloader Setup

Knowledge Sources	ColossalAI Direct Preference Optimization
Domains	NLP, Data_Engineering
Last Updated	2026-02-09 00:00 GMT

Overview

A data loading pattern that creates batches of preference pairs with stateful distributed sampling for DPO training across multiple GPUs.

Description

Preference Dataloader Setup creates PyTorch DataLoaders that yield batches containing both chosen and rejected sequences. It uses a DataCollatorForPreferenceDataset to pad and collate preference pairs into uniform-length batches, and a StatefulDistributedSampler that maintains its position across training resumptions.

Usage

Use this after preference data preparation and before the DPO training loop. The stateful sampler enables training resumption from the exact data position.

Theoretical Basis

The data loading must handle parallel sequences:

Load Arrow datasets containing both chosen and rejected tokenized sequences
Apply DataCollatorForPreferenceDataset to pad sequences to uniform batch length
Use StatefulDistributedSampler to shard data across GPUs with resumption capability
Each batch yields: chosen_input_ids, chosen_attention_mask, chosen_loss_mask, rejected_input_ids, rejected_attention_mask, rejected_loss_mask

Related Pages

Implemented By

Implementation:Hpcaitech_ColossalAI_DataCollatorForPreferenceDataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment