Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Dynamic Preprocess

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Preprocessing
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for splitting images into aspect-ratio-aware tiles provided by the InternVL preprocessing pipeline.

Description

The dynamic_preprocess function implements InternVL's dynamic resolution strategy. Given a PIL image, it computes the optimal tiling layout (rows x columns) that best preserves the image's aspect ratio, then resizes and splits the image into fixed-size tiles. An optional thumbnail provides global context alongside the detailed tile crops.

Usage

Import this function when preprocessing images for InternVL inference or custom data pipelines. It is called internally by LazySupervisedDataset during training, but can also be used standalone for inference preprocessing.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/train/dataset.py
  • Lines: L830-866

Signature

def dynamic_preprocess(
    image,
    min_num=1,
    max_num=6,
    image_size=448,
    use_thumbnail=False,
):
    """
    Dynamically split an image into aspect-ratio-aware tiles.

    Args:
        image: PIL.Image - Input image to preprocess
        min_num: int - Minimum number of tiles (default 1)
        max_num: int - Maximum number of tiles (default 6)
        image_size: int - Size of each tile in pixels (default 448)
        use_thumbnail: bool - Whether to append a global thumbnail (default False)

    Returns:
        list[PIL.Image] - List of image tiles (optionally with thumbnail appended)
    """

Import

from internvl.train.dataset import dynamic_preprocess

I/O Contract

Inputs

Name Type Required Description
image PIL.Image Yes Input image of arbitrary size and aspect ratio
min_num int No Minimum tile count (default 1)
max_num int No Maximum tile count (default 6, training uses up to 12)
image_size int No Tile pixel size (default 448)
use_thumbnail bool No Append downscaled global thumbnail (default False)

Outputs

Name Type Description
tiles list[PIL.Image] List of image tiles, each of size (image_size, image_size). Length = rows*cols (+ 1 if use_thumbnail=True)

Usage Examples

Basic Dynamic Tiling

from PIL import Image
from internvl.train.dataset import dynamic_preprocess

# Load a high-resolution image
image = Image.open('document_scan.png')  # e.g., 2048x1536
print(f'Original size: {image.size}')    # (2048, 1536)

# Split into tiles with thumbnail
tiles = dynamic_preprocess(
    image,
    min_num=1,
    max_num=12,
    image_size=448,
    use_thumbnail=True,
)

print(f'Number of tiles: {len(tiles)}')  # e.g., 7 (6 tiles + 1 thumbnail)
for i, tile in enumerate(tiles):
    print(f'Tile {i}: {tile.size}')      # Each is (448, 448)

Inference Preprocessing

import torch
import torchvision.transforms as T
from internvl.train.dataset import dynamic_preprocess, build_transform

# Build transform pipeline
transform = build_transform(is_train=False, input_size=448)

# Process image
image = Image.open('chart.png')
tiles = dynamic_preprocess(image, max_num=12, use_thumbnail=True)
pixel_values = torch.stack([transform(tile) for tile in tiles])
# pixel_values shape: [num_tiles, 3, 448, 448]

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment