Implementation:OpenGVLab InternVL Concat Pad Data Collator
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Training, Data_Engineering |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for batching multimodal training samples with padding and concatenation provided by the InternVL training framework.
Description
The concat_pad_data_collator function pads text sequences to the maximum length in the batch and concatenates image pixel values and flags across samples. It handles the variable-length nature of both text (different conversation lengths) and images (different tile counts per sample).
Usage
Use this collator for standard supervised fine-tuning and pretraining. Pass it as the data_collator argument to the HuggingFace Trainer.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/patch/pad_data_collator.py
- Lines: L57-116
Signature
def concat_pad_data_collator(features, max_item_length=None, pad_id=0):
"""
Collate function for multimodal training batches.
Args:
features: List[Dict] - List of sample dicts from LazySupervisedDataset
max_item_length: Optional[int] - Max sequence length (None = use batch max)
pad_id: int - Padding token ID (default 0)
Returns:
Dict[str, torch.Tensor] - Batched tensors with padding applied
"""
Import
from internvl.patch.pad_data_collator import concat_pad_data_collator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| features | List[Dict[str, Tensor]] | Yes | List of per-sample dicts with input_ids, labels, attention_mask, pixel_values, image_flags |
| max_item_length | int | No | Maximum sequence length for truncation (default None = no truncation) |
| pad_id | int | No | Token ID used for padding (default 0) |
Outputs
| Name | Type | Description |
|---|---|---|
| batch | Dict[str, torch.Tensor] | Batched dict with padded input_ids [B, max_len], labels [B, max_len], attention_mask [B, max_len], concatenated pixel_values [total_tiles, 3, H, W], image_flags [total_tiles] |
Usage Examples
With HuggingFace Trainer
from internvl.patch.pad_data_collator import concat_pad_data_collator
from transformers import Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
data_collator=concat_pad_data_collator,
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment