Implementation:OpenGVLab InternVL Dynamic Preprocess
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Preprocessing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for splitting images into aspect-ratio-aware tiles provided by the InternVL preprocessing pipeline.
Description
The dynamic_preprocess function implements InternVL's dynamic resolution strategy. Given a PIL image, it computes the optimal tiling layout (rows x columns) that best preserves the image's aspect ratio, then resizes and splits the image into fixed-size tiles. An optional thumbnail provides global context alongside the detailed tile crops.
Usage
Import this function when preprocessing images for InternVL inference or custom data pipelines. It is called internally by LazySupervisedDataset during training, but can also be used standalone for inference preprocessing.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/train/dataset.py
- Lines: L830-866
Signature
def dynamic_preprocess(
image,
min_num=1,
max_num=6,
image_size=448,
use_thumbnail=False,
):
"""
Dynamically split an image into aspect-ratio-aware tiles.
Args:
image: PIL.Image - Input image to preprocess
min_num: int - Minimum number of tiles (default 1)
max_num: int - Maximum number of tiles (default 6)
image_size: int - Size of each tile in pixels (default 448)
use_thumbnail: bool - Whether to append a global thumbnail (default False)
Returns:
list[PIL.Image] - List of image tiles (optionally with thumbnail appended)
"""
Import
from internvl.train.dataset import dynamic_preprocess
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image | PIL.Image | Yes | Input image of arbitrary size and aspect ratio |
| min_num | int | No | Minimum tile count (default 1) |
| max_num | int | No | Maximum tile count (default 6, training uses up to 12) |
| image_size | int | No | Tile pixel size (default 448) |
| use_thumbnail | bool | No | Append downscaled global thumbnail (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| tiles | list[PIL.Image] | List of image tiles, each of size (image_size, image_size). Length = rows*cols (+ 1 if use_thumbnail=True) |
Usage Examples
Basic Dynamic Tiling
from PIL import Image
from internvl.train.dataset import dynamic_preprocess
# Load a high-resolution image
image = Image.open('document_scan.png') # e.g., 2048x1536
print(f'Original size: {image.size}') # (2048, 1536)
# Split into tiles with thumbnail
tiles = dynamic_preprocess(
image,
min_num=1,
max_num=12,
image_size=448,
use_thumbnail=True,
)
print(f'Number of tiles: {len(tiles)}') # e.g., 7 (6 tiles + 1 thumbnail)
for i, tile in enumerate(tiles):
print(f'Tile {i}: {tile.size}') # Each is (448, 448)
Inference Preprocessing
import torch
import torchvision.transforms as T
from internvl.train.dataset import dynamic_preprocess, build_transform
# Build transform pipeline
transform = build_transform(is_train=False, input_size=448)
# Process image
image = Image.open('chart.png')
tiles = dynamic_preprocess(image, max_num=12, use_thumbnail=True)
pixel_values = torch.stack([transform(tile) for tile in tiles])
# pixel_values shape: [num_tiles, 3, 448, 448]