Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL Segmentation Transform Pipeline

From Leeroopedia
Revision as of 16:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/OpenGVLab_InternVL_Segmentation_Transform_Pipeline.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data Augmentation, Semantic Segmentation, Image Transforms
Last Updated 2026-02-07 14:00 GMT

Overview

This module implements custom data augmentation transforms for the InternVL segmentation pipeline, providing SETR-compatible multi-scale resizing and minimum-size padding.

Description

The transform.py file provides two registered MMSegmentation pipeline transforms that handle image and segmentation mask resizing/padding:

SETR_Resize is a comprehensive multi-scale image resize transform with three modes:

  • Ratio range mode: Samples a random ratio from a specified range and multiplies it with the base image scale
  • Range mode: Randomly samples scale dimensions between defined lower and upper bounds
  • Value mode: Randomly selects from a predefined list of scale tuples

The transform supports SETR-style multi-scale handling via the setr_multi_scale flag, which ensures the short side is always at least as large as the crop_size. It resizes both images (via mmcv.imrescale/mmcv.imresize) and segmentation maps with appropriate interpolation (bilinear for images, nearest-neighbor for masks), and updates results with scale_factor, img_shape, pad_shape, and keep_ratio metadata.

PadShortSide pads images and segmentation masks so both dimensions meet a minimum size threshold. It only applies padding when the short side is smaller than the specified size, using mmcv.impad with configurable padding values (default 0 for images, 255 for segmentation maps).

Both transforms are registered via @PIPELINES.register_module() for use in MMSegmentation config files.

Usage

Use these transforms in MMSegmentation pipeline configurations when training InternVL segmentation models. They are essential for SETR-compatible multi-scale training and ensuring minimum input sizes for the vision backbone.

Code Reference

Source Location

Signature

@PIPELINES.register_module()
class SETR_Resize(object):
    def __init__(self, img_scale=None, multiscale_mode='range',
                 ratio_range=None, keep_ratio=True,
                 crop_size=None, setr_multi_scale=False): ...
    def __call__(self, results) -> dict: ...

@PIPELINES.register_module()
class PadShortSide(object):
    def __init__(self, size=None, pad_val=0, seg_pad_val=255): ...
    def __call__(self, results) -> dict: ...

Import

from mmseg_custom.datasets.pipelines.transform import SETR_Resize, PadShortSide

I/O Contract

Inputs

Name Type Required Description
results dict Yes Pipeline results dict containing 'img', 'seg_fields', and optionally 'scale'
img_scale tuple or list[tuple] No Target image scales for resizing (SETR_Resize)
multiscale_mode str No "range" or "value" for multi-scale selection (default: "range")
ratio_range tuple[float] No Min/max ratio range for random scale sampling
keep_ratio bool No Preserve aspect ratio during resize (default: True)
crop_size tuple No Minimum crop size for SETR multi-scale mode
size int No Minimum size for short side padding (PadShortSide)

Outputs

Name Type Description
results dict Updated dict with resized/padded 'img', 'gt_semantic_seg', plus 'img_shape', 'pad_shape', 'scale_factor', 'keep_ratio'

Usage Examples

Basic Usage

# In MMSegmentation config file:
# train_pipeline = [
#     dict(type='LoadImageFromFile'),
#     dict(type='LoadAnnotations'),
#     dict(type='SETR_Resize',
#          img_scale=[(2048, 448), (2048, 896)],
#          multiscale_mode='range',
#          keep_ratio=True,
#          crop_size=(448, 448),
#          setr_multi_scale=True),
#     dict(type='PadShortSide', size=448),
#     dict(type='RandomCrop', crop_size=(448, 448)),
#     dict(type='RandomFlip', prob=0.5),
#     dict(type='Normalize', ...),
# ]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment