Implementation:OpenGVLab InternVL Build Transform
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Preprocessing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for building torchvision transform compositions provided by the InternVL preprocessing pipeline.
Description
The build_transform function constructs a torchvision.transforms.Compose pipeline that converts PIL images into normalized tensors. It supports three normalization schemes (ImageNet, CLIP, SigLIP), optional pad-to-square preprocessing, and separate train/eval augmentation paths.
Usage
Import this function when building image preprocessing pipelines for InternVL training or inference. It is called internally by LazySupervisedDataset and is also used in evaluation scripts.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/train/dataset.py
- Lines: L276-310
Signature
def build_transform(is_train, input_size, pad2square=False, normalize_type='imagenet'):
"""
Build a torchvision transform pipeline for image preprocessing.
Args:
is_train: bool - Whether to use training augmentations
input_size: int - Target image size in pixels
pad2square: bool - Pad images to square before resizing (default False)
normalize_type: str - Normalization type: 'imagenet', 'clip', or 'siglip'
Returns:
torchvision.transforms.Compose - Complete transform pipeline
"""
Import
from internvl.train.dataset import build_transform
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| is_train | bool | Yes | Training mode enables random augmentations |
| input_size | int | Yes | Target pixel size (typically 448) |
| pad2square | bool | No | Pad to square before resize (default False) |
| normalize_type | str | No | Normalization statistics: 'imagenet', 'clip', 'siglip' (default 'imagenet') |
Outputs
| Name | Type | Description |
|---|---|---|
| transform | torchvision.transforms.Compose | Pipeline: [optional pad] -> Resize -> ToTensor -> Normalize |
Usage Examples
Training Transform
from internvl.train.dataset import build_transform
# Training transform with ImageNet normalization
train_transform = build_transform(
is_train=True,
input_size=448,
normalize_type='imagenet',
)
# Apply to a PIL image
from PIL import Image
img = Image.open('photo.jpg')
tensor = train_transform(img) # shape: [3, 448, 448]
Inference Transform
# Inference transform (deterministic, no augmentation)
eval_transform = build_transform(
is_train=False,
input_size=448,
normalize_type='imagenet',
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment