Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mit han lab Llm awq Dynamic Image Video Preprocessing

From Leeroopedia
Knowledge Sources
Domains Vision, Preprocessing
Last Updated 2026-02-15 00:00 GMT

Overview

Principle of dynamically tiling images into aspect-ratio-aware patches and uniformly sampling video frames for vision transformer processing.

Description

Dynamic image preprocessing adapts to varying image aspect ratios by selecting the closest matching tile grid (e.g., 2x3, 1x4) from a set of allowed configurations, then splitting the image into equal-sized patches. This preserves spatial information better than naive center-cropping or stretching. For videos, uniform temporal sampling extracts representative frames which are individually preprocessed. All patches undergo ImageNet normalization.

Usage

Apply this principle when preparing media inputs for vision transformers that process fixed-size patch sequences, especially when input images have diverse aspect ratios.

Theoretical Basis

Given an image with aspect ratio r = w/h, find the grid (m, n) that minimizes |m/n - r| subject to m*n <= max_patches. Resize the image to (m * patch_size, n * patch_size), then crop into m*n patches of size (patch_size, patch_size).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment