Principle:Ggml org Ggml Image Preprocessing

Image Preprocessing

Preparing raw images for neural network input through resize, normalization, and format conversion.

Vision models expect fixed-size, normalized float tensors as input. Raw images arriving in arbitrary resolutions and integer pixel formats must be transformed into the exact layout and value range each model was trained on. This preprocessing bridge is essential: incorrect normalization or resize strategy will silently degrade model accuracy.

Key Operations

The canonical preprocessing pipeline follows this sequence:

Decode image (JPEG/PNG) into raw pixel buffer
Resize with aspect-ratio preservation to the model's expected spatial dimensions
Normalize pixel values to the model-specific floating-point range
Convert to the required float tensor layout (e.g., planar CHW vs. interleaved HWC)

Each step must match the exact conventions used during model training; deviations cause distribution shift at inference time.

Model-Specific Pipelines

SAM (Segment Anything Model)

Resize: Bilinear interpolation to 1024x1024 with aspect-ratio preservation (the shorter side is scaled proportionally)
Normalization: ImageNet statistics
- mean = [123.675, 116.28, 103.53]
- std = [58.395, 57.12, 57.375]
Padding: Zero-padding applied to fill the remaining area after aspect-preserving resize
Layout: HWC interleaved float32

YOLO

Resize: Letterbox resize to 416x416 with 0.5 gray padding to preserve aspect ratio
Layout: CHW planar float32
Normalization: Pixel values scaled to [0, 1] range

General Observation

Different models require different preprocessing pipelines. There is no universal preprocessing step; each model family defines its own expected input contract. Implementing the wrong pipeline for a given model is a common source of inference errors.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment