Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Build Transform

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Preprocessing
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for building torchvision transform compositions provided by the InternVL preprocessing pipeline.

Description

The build_transform function constructs a torchvision.transforms.Compose pipeline that converts PIL images into normalized tensors. It supports three normalization schemes (ImageNet, CLIP, SigLIP), optional pad-to-square preprocessing, and separate train/eval augmentation paths.

Usage

Import this function when building image preprocessing pipelines for InternVL training or inference. It is called internally by LazySupervisedDataset and is also used in evaluation scripts.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/train/dataset.py
  • Lines: L276-310

Signature

def build_transform(is_train, input_size, pad2square=False, normalize_type='imagenet'):
    """
    Build a torchvision transform pipeline for image preprocessing.

    Args:
        is_train: bool - Whether to use training augmentations
        input_size: int - Target image size in pixels
        pad2square: bool - Pad images to square before resizing (default False)
        normalize_type: str - Normalization type: 'imagenet', 'clip', or 'siglip'

    Returns:
        torchvision.transforms.Compose - Complete transform pipeline
    """

Import

from internvl.train.dataset import build_transform

I/O Contract

Inputs

Name Type Required Description
is_train bool Yes Training mode enables random augmentations
input_size int Yes Target pixel size (typically 448)
pad2square bool No Pad to square before resize (default False)
normalize_type str No Normalization statistics: 'imagenet', 'clip', 'siglip' (default 'imagenet')

Outputs

Name Type Description
transform torchvision.transforms.Compose Pipeline: [optional pad] -> Resize -> ToTensor -> Normalize

Usage Examples

Training Transform

from internvl.train.dataset import build_transform

# Training transform with ImageNet normalization
train_transform = build_transform(
    is_train=True,
    input_size=448,
    normalize_type='imagenet',
)

# Apply to a PIL image
from PIL import Image
img = Image.open('photo.jpg')
tensor = train_transform(img)  # shape: [3, 448, 448]

Inference Transform

# Inference transform (deterministic, no augmentation)
eval_transform = build_transform(
    is_train=False,
    input_size=448,
    normalize_type='imagenet',
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment