Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL DPO Concat Pad Data Collator

From Leeroopedia


Knowledge Sources
Domains Alignment, Data_Engineering
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for batching DPO preference pairs with multimodal data provided by the InternVL training framework.

Description

The dpo_concat_pad_data_collator function extends the standard data collator for DPO/MPO training. It pads chosen and rejected text sequences independently and concatenates pixel_values and image_flags across the batch.

Usage

Pass as the data_collator argument to MultimodalDPOTrainer during MPO training.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/patch/pad_data_collator.py
  • Lines: L119-155

Signature

def dpo_concat_pad_data_collator(features, pad_id=0):
    """
    Collate function for DPO training with multimodal data.

    Pads chosen/rejected sequences independently and concatenates
    pixel_values and image_flags across the batch.

    Args:
        features: List[Dict] - List of DPO sample dicts with chosen_*/rejected_* keys
        pad_id: int - Padding token ID (default 0)

    Returns:
        Dict[str, torch.Tensor] - Batched tensors for DPO training
    """

Import

from internvl.patch.pad_data_collator import dpo_concat_pad_data_collator

I/O Contract

Inputs

Name Type Required Description
features List[Dict] Yes DPO samples with chosen_input_ids, chosen_labels, rejected_input_ids, rejected_labels, pixel_values, image_flags
pad_id int No Padding token ID (default 0)

Outputs

Name Type Description
batch Dict[str, torch.Tensor] Batched dict with padded chosen/rejected sequences and concatenated pixel_values/image_flags

Usage Examples

With MultimodalDPOTrainer

from internvl.patch.pad_data_collator import dpo_concat_pad_data_collator
from internvl.train.trainer_dpo import MultimodalDPOTrainer

trainer = MultimodalDPOTrainer(
    model=model,
    ref_model=ref_model,
    args=dpo_config,
    train_dataset=dpo_dataset,
    tokenizer=tokenizer,
    data_collator=dpo_concat_pad_data_collator,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment