Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenGVLab InternVL DPO Data Collation

From Leeroopedia


Knowledge Sources
Domains Alignment, Data_Engineering, Vision_Language
Last Updated 2026-02-07 00:00 GMT

Overview

A specialized batching strategy for preference optimization training that handles chosen and rejected response pairs alongside multimodal image data.

Description

DPO training requires paired examples: for each input, both a chosen (preferred) and a rejected (dispreferred) response must be present. The DPO data collator extends the standard multimodal collator to:

  • Pad chosen and rejected text sequences independently (they may have different lengths)
  • Concatenate pixel_values and image_flags across all samples in the batch
  • Maintain separate chosen/rejected fields for the DPO loss computation

Unlike the standard collator which handles single responses, the DPO collator manages four text sequences per sample: chosen_input_ids, chosen_labels, rejected_input_ids, rejected_labels.

Usage

Use this collator for DPO/MPO preference optimization training. It replaces the standard concat_pad_data_collator when training with MultimodalDPOTrainer.

Theoretical Basis

# Pseudo-code: DPO batch collation
def dpo_collate(samples):
    # Pad chosen and rejected independently
    chosen_ids = pad([s['chosen_input_ids'] for s in samples])
    chosen_labels = pad([s['chosen_labels'] for s in samples], pad_value=-100)
    rejected_ids = pad([s['rejected_input_ids'] for s in samples])
    rejected_labels = pad([s['rejected_labels'] for s in samples], pad_value=-100)

    # Concatenate multimodal data across batch
    pixel_values = cat([s['pixel_values'] for s in samples])
    image_flags = cat([s['image_flags'] for s in samples])

    return {chosen_ids, chosen_labels, rejected_ids, rejected_labels,
            pixel_values, image_flags}

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment