Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Image Processing

From Leeroopedia


Overview

The Image Processing module implements GPU-accelerated image preprocessing operations as TVM TIR (Tensor IR) scheduled kernels. It is located at python/mlc_llm/model/vision/image_processing.py (286 lines).

The ImageProcessor class provides methods for image resizing, center cropping, rescaling, normalization, and padding -- all implemented as GPU-scheduled TIR primitive functions. These operations run on the GPU using thread binding for parallel execution and are designed to preprocess images before feeding them into vision encoders like CLIP.

Source File

  • File: python/mlc_llm/model/vision/image_processing.py
  • Lines: 286
  • Module: mlc_llm.model.vision.image_processing

Dependencies

Import Purpose
tvm.s_tir TVM scheduled TIR for applying GPU schedules to primitive functions
tvm.tir TVM TIR for symbolic computation (Select, div, ceil, sqrt, etc.)
tvm.relax.frontend.nn Neural network module base class and tensor types
tvm.relax.frontend.nn.op Operations including interpolate and tensor_ir_op
tvm.script.tir TIR script decorator for defining primitive functions

Helper Function: _var

def _var(dtype, size=1):
    return T.alloc_buffer((size,), dtype, scope="local")

Allocates a small local buffer used as a thread-local variable in TIR kernels.

Class: ImageProcessor

All methods assume images in NCHW layout (batch, channels, height, width) with 3 color channels.

Method: apply_schedule

def apply_schedule(self, sch, block, bdx=32, tile=[32, 32]):
    loop_x, loop_y = sch.get_loops(block)[-2:]
    xo, xi = sch.split(loop_x, factors=[tile[0], None])
    yo, yi = sch.split(loop_y, factors=[tile[1], None])
    sch.reorder(xo, yo, xi, yi)
    t = sch.fuse(xo, yo)
    ty, tx = sch.split(t, factors=[None, bdx])
    sch.bind(ty, "threadIdx.y")
    sch.bind(tx, "threadIdx.x")

Applies a standard GPU tiling schedule to a TIR block. The last two loops (typically height and width) are split into tiles of 32x32, reordered, fused, and bound to CUDA thread indices. This schedule is reused by all the image processing kernels.

Method: resize

def resize(self, image: Tensor, params):

Resizes an image using bilinear interpolation via op.interpolate. Supports three parameter modes:

Mode Parameters Behavior
Explicit size height, width Resizes to exact dimensions
Shortest edge shortest_edge Scales the image so the shorter dimension matches the target, maintaining aspect ratio
HD transform hd_transform Computes optimal scale factor for high-definition processing using hd_num (default 4) and pad_num (default 336) parameters

The HD transform mode computes the scaling as:

ratio = max(w, h) / min(w, h)
scale = ceil(sqrt(hd_num * ratio))
# Adjust scale if it would exceed hd_num tiles
if (scale * ceil(scale / ratio)) > hd_num:
    scale = scale - 1

Method: crop

def crop(self, image: Tensor, crop_size):

Performs a center crop on the image. The crop boundaries are computed as:

top = (orig_height - crop_height) // 2
bottom = orig_height - top
left = (orig_width - crop_width) // 2
right = orig_width - left

The crop is implemented as a TIR primitive function with GPU thread bindings on batch and channel dimensions (blockIdx.x and blockIdx.y), while the spatial dimensions are processed using the tiled schedule.

Method: rescale

def rescale(self, image: Tensor, rescale_factor=1/255.0, o_dtype="float32"):

Multiplies each pixel value by a rescale factor (default 1/255.0 to convert from [0, 255] to [0, 1]) and casts to the output dtype. Implemented as a GPU TIR kernel:

out_buf[n_idx, c_idx, h_idx, w_idx] = (
    T.cast(image_buf[n_idx, c_idx, h_idx, w_idx], o_dtype)
    * rescale_factor
)

Method: normalize

def normalize(self, image: Tensor, o_dtype="float32"):

Applies per-channel mean subtraction and standard deviation normalization using the ImageNet/CLIP standard normalization values:

Channel Mean Std Dev
R (channel 0) 0.48145466 0.26862954
G (channel 1) 0.4578275 0.26130258
B (channel 2) 0.40821073 0.27577711

The formula applied per pixel is:

output[n, c, h, w] = (input[n, c, h, w] - mean[c]) / stddev[c]

The mean and standard deviation values are initialized inside the TIR kernel using a T.init() block and stored in thread-local buffers.

Method: pad

def pad(self, image: Tensor, dtype="uint8"):

Pads the image vertically (top and bottom) so that the height becomes a multiple of 336 pixels. The padding value is 255 (white). Horizontal padding is set to zero (no padding).

h = image.shape[2]
tar = tir.truncdiv(h + 335, 336) * 336
t = tir.div(tar - h, 2)        # top padding
b = tar - h - t                  # bottom padding

The padding is distributed evenly between top and bottom. Padded regions are filled with the value 255, while non-padded regions copy from the original image.

Method: preprocess

def preprocess(self, pixel_values):
    return pixel_values

A pass-through method that returns the input unchanged. This serves as a hook for subclasses to override with model-specific preprocessing pipelines.

GPU Kernel Design Pattern

All TIR kernels in this module share a common structure:

  1. Thread binding: Batch dimension bound to blockIdx.x, channel dimension bound to blockIdx.y.
  2. Spatial processing: Height and width dimensions processed in a grid loop within a schedulable block (T.sblock).
  3. Tiled scheduling: The apply_schedule method tiles the spatial dimensions and binds them to thread indices.
  4. Attributes: All kernels are marked with tir.is_scheduled = 1 (pre-scheduled, no further auto-scheduling) and tir.noalias = True.

Categories

  • Vision Preprocessing
  • GPU Kernels
  • TVM TIR
  • Image Processing
  • CLIP Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment