Implementation:OpenGVLab InternVL Concat Pad Data Collator

Knowledge Sources	InternVL
Domains	Training, Data_Engineering
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for batching multimodal training samples with padding and concatenation provided by the InternVL training framework.

Description

The concat_pad_data_collator function pads text sequences to the maximum length in the batch and concatenates image pixel values and flags across samples. It handles the variable-length nature of both text (different conversation lengths) and images (different tile counts per sample).

Usage

Use this collator for standard supervised fine-tuning and pretraining. Pass it as the data_collator argument to the HuggingFace Trainer.

Code Reference

Source Location

Repository: InternVL
File: internvl_chat/internvl/patch/pad_data_collator.py
Lines: L57-116

Signature

def concat_pad_data_collator(features, max_item_length=None, pad_id=0):
    """
    Collate function for multimodal training batches.

    Args:
        features: List[Dict] - List of sample dicts from LazySupervisedDataset
        max_item_length: Optional[int] - Max sequence length (None = use batch max)
        pad_id: int - Padding token ID (default 0)

    Returns:
        Dict[str, torch.Tensor] - Batched tensors with padding applied
    """

Import

from internvl.patch.pad_data_collator import concat_pad_data_collator

I/O Contract

Inputs

Name	Type	Required	Description
features	List[Dict[str, Tensor]]	Yes	List of per-sample dicts with input_ids, labels, attention_mask, pixel_values, image_flags
max_item_length	int	No	Maximum sequence length for truncation (default None = no truncation)
pad_id	int	No	Token ID used for padding (default 0)

Outputs

Name	Type	Description
batch	Dict[str, torch.Tensor]	Batched dict with padded input_ids [B, max_len], labels [B, max_len], attention_mask [B, max_len], concatenated pixel_values [total_tiles, 3, H, W], image_flags [total_tiles]

Usage Examples

With HuggingFace Trainer

from internvl.patch.pad_data_collator import concat_pad_data_collator
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    data_collator=concat_pad_data_collator,
)

Related Pages

Implements Principle

Principle:OpenGVLab_InternVL_Multimodal_Data_Collation

Requires Environment

Environment:OpenGVLab_InternVL_PyTorch_CUDA

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment