Implementation:OpenGVLab InternVL Segmentation Transform Pipeline

Knowledge Sources	OpenGVLab_InternVL
Domains	Data Augmentation, Semantic Segmentation, Image Transforms
Last Updated	2026-02-07 14:00 GMT

Overview

This module implements custom data augmentation transforms for the InternVL segmentation pipeline, providing SETR-compatible multi-scale resizing and minimum-size padding.

Description

The transform.py file provides two registered MMSegmentation pipeline transforms that handle image and segmentation mask resizing/padding:

SETR_Resize is a comprehensive multi-scale image resize transform with three modes:

Ratio range mode: Samples a random ratio from a specified range and multiplies it with the base image scale
Range mode: Randomly samples scale dimensions between defined lower and upper bounds
Value mode: Randomly selects from a predefined list of scale tuples

The transform supports SETR-style multi-scale handling via the setr_multi_scale flag, which ensures the short side is always at least as large as the crop_size. It resizes both images (via mmcv.imrescale/mmcv.imresize) and segmentation maps with appropriate interpolation (bilinear for images, nearest-neighbor for masks), and updates results with scale_factor, img_shape, pad_shape, and keep_ratio metadata.

PadShortSide pads images and segmentation masks so both dimensions meet a minimum size threshold. It only applies padding when the short side is smaller than the specified size, using mmcv.impad with configurable padding values (default 0 for images, 255 for segmentation maps).

Both transforms are registered via @PIPELINES.register_module() for use in MMSegmentation config files.

Usage

Use these transforms in MMSegmentation pipeline configurations when training InternVL segmentation models. They are essential for SETR-compatible multi-scale training and ensuring minimum input sizes for the vision backbone.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: segmentation/mmseg_custom/datasets/pipelines/transform.py
Lines: 1-313

Signature

@PIPELINES.register_module()
class SETR_Resize(object):
    def __init__(self, img_scale=None, multiscale_mode='range',
                 ratio_range=None, keep_ratio=True,
                 crop_size=None, setr_multi_scale=False): ...
    def __call__(self, results) -> dict: ...

@PIPELINES.register_module()
class PadShortSide(object):
    def __init__(self, size=None, pad_val=0, seg_pad_val=255): ...
    def __call__(self, results) -> dict: ...

Import

from mmseg_custom.datasets.pipelines.transform import SETR_Resize, PadShortSide

I/O Contract

Inputs

Name	Type	Required	Description
results	dict	Yes	Pipeline results dict containing 'img', 'seg_fields', and optionally 'scale'
img_scale	tuple or list[tuple]	No	Target image scales for resizing (SETR_Resize)
multiscale_mode	str	No	"range" or "value" for multi-scale selection (default: "range")
ratio_range	tuple[float]	No	Min/max ratio range for random scale sampling
keep_ratio	bool	No	Preserve aspect ratio during resize (default: True)
crop_size	tuple	No	Minimum crop size for SETR multi-scale mode
size	int	No	Minimum size for short side padding (PadShortSide)

Outputs

Name	Type	Description
results	dict	Updated dict with resized/padded 'img', 'gt_semantic_seg', plus 'img_shape', 'pad_shape', 'scale_factor', 'keep_ratio'

Usage Examples

Basic Usage

# In MMSegmentation config file:
# train_pipeline = [
#     dict(type='LoadImageFromFile'),
#     dict(type='LoadAnnotations'),
#     dict(type='SETR_Resize',
#          img_scale=[(2048, 448), (2048, 896)],
#          multiscale_mode='range',
#          keep_ratio=True,
#          crop_size=(448, 448),
#          setr_multi_scale=True),
#     dict(type='PadShortSide', size=448),
#     dict(type='RandomCrop', crop_size=(448, 448)),
#     dict(type='RandomFlip', prob=0.5),
#     dict(type='Normalize', ...),
# ]

Related Pages

Principle:OpenGVLab_InternVL_Image_Transform_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment