Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Concat Pad Data Collator

From Leeroopedia


Knowledge Sources
Domains Training, Data_Engineering
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for batching multimodal training samples with padding and concatenation provided by the InternVL training framework.

Description

The concat_pad_data_collator function pads text sequences to the maximum length in the batch and concatenates image pixel values and flags across samples. It handles the variable-length nature of both text (different conversation lengths) and images (different tile counts per sample).

Usage

Use this collator for standard supervised fine-tuning and pretraining. Pass it as the data_collator argument to the HuggingFace Trainer.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/patch/pad_data_collator.py
  • Lines: L57-116

Signature

def concat_pad_data_collator(features, max_item_length=None, pad_id=0):
    """
    Collate function for multimodal training batches.

    Args:
        features: List[Dict] - List of sample dicts from LazySupervisedDataset
        max_item_length: Optional[int] - Max sequence length (None = use batch max)
        pad_id: int - Padding token ID (default 0)

    Returns:
        Dict[str, torch.Tensor] - Batched tensors with padding applied
    """

Import

from internvl.patch.pad_data_collator import concat_pad_data_collator

I/O Contract

Inputs

Name Type Required Description
features List[Dict[str, Tensor]] Yes List of per-sample dicts with input_ids, labels, attention_mask, pixel_values, image_flags
max_item_length int No Maximum sequence length for truncation (default None = no truncation)
pad_id int No Token ID used for padding (default 0)

Outputs

Name Type Description
batch Dict[str, torch.Tensor] Batched dict with padded input_ids [B, max_len], labels [B, max_len], attention_mask [B, max_len], concatenated pixel_values [total_tiles, 3, H, W], image_flags [total_tiles]

Usage Examples

With HuggingFace Trainer

from internvl.patch.pad_data_collator import concat_pad_data_collator
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    data_collator=concat_pad_data_collator,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment