Implementation:OpenGVLab InternVL Segmentation Transform Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Data Augmentation, Semantic Segmentation, Image Transforms |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This module implements custom data augmentation transforms for the InternVL segmentation pipeline, providing SETR-compatible multi-scale resizing and minimum-size padding.
Description
The transform.py file provides two registered MMSegmentation pipeline transforms that handle image and segmentation mask resizing/padding:
SETR_Resize is a comprehensive multi-scale image resize transform with three modes:
- Ratio range mode: Samples a random ratio from a specified range and multiplies it with the base image scale
- Range mode: Randomly samples scale dimensions between defined lower and upper bounds
- Value mode: Randomly selects from a predefined list of scale tuples
The transform supports SETR-style multi-scale handling via the setr_multi_scale flag, which ensures the short side is always at least as large as the crop_size. It resizes both images (via mmcv.imrescale/mmcv.imresize) and segmentation maps with appropriate interpolation (bilinear for images, nearest-neighbor for masks), and updates results with scale_factor, img_shape, pad_shape, and keep_ratio metadata.
PadShortSide pads images and segmentation masks so both dimensions meet a minimum size threshold. It only applies padding when the short side is smaller than the specified size, using mmcv.impad with configurable padding values (default 0 for images, 255 for segmentation maps).
Both transforms are registered via @PIPELINES.register_module() for use in MMSegmentation config files.
Usage
Use these transforms in MMSegmentation pipeline configurations when training InternVL segmentation models. They are essential for SETR-compatible multi-scale training and ensuring minimum input sizes for the vision backbone.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: segmentation/mmseg_custom/datasets/pipelines/transform.py
- Lines: 1-313
Signature
@PIPELINES.register_module()
class SETR_Resize(object):
def __init__(self, img_scale=None, multiscale_mode='range',
ratio_range=None, keep_ratio=True,
crop_size=None, setr_multi_scale=False): ...
def __call__(self, results) -> dict: ...
@PIPELINES.register_module()
class PadShortSide(object):
def __init__(self, size=None, pad_val=0, seg_pad_val=255): ...
def __call__(self, results) -> dict: ...
Import
from mmseg_custom.datasets.pipelines.transform import SETR_Resize, PadShortSide
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| results | dict | Yes | Pipeline results dict containing 'img', 'seg_fields', and optionally 'scale' |
| img_scale | tuple or list[tuple] | No | Target image scales for resizing (SETR_Resize) |
| multiscale_mode | str | No | "range" or "value" for multi-scale selection (default: "range") |
| ratio_range | tuple[float] | No | Min/max ratio range for random scale sampling |
| keep_ratio | bool | No | Preserve aspect ratio during resize (default: True) |
| crop_size | tuple | No | Minimum crop size for SETR multi-scale mode |
| size | int | No | Minimum size for short side padding (PadShortSide) |
Outputs
| Name | Type | Description |
|---|---|---|
| results | dict | Updated dict with resized/padded 'img', 'gt_semantic_seg', plus 'img_shape', 'pad_shape', 'scale_factor', 'keep_ratio' |
Usage Examples
Basic Usage
# In MMSegmentation config file:
# train_pipeline = [
# dict(type='LoadImageFromFile'),
# dict(type='LoadAnnotations'),
# dict(type='SETR_Resize',
# img_scale=[(2048, 448), (2048, 896)],
# multiscale_mode='range',
# keep_ratio=True,
# crop_size=(448, 448),
# setr_multi_scale=True),
# dict(type='PadShortSide', size=448),
# dict(type='RandomCrop', crop_size=(448, 448)),
# dict(type='RandomFlip', prob=0.5),
# dict(type='Normalize', ...),
# ]