Implementation:Alibaba MNN PyMNN CV Preprocessing
| Field | Value |
|---|---|
| implementation_name | PyMNN_CV_Preprocessing |
| schema_version | 0.1.0 |
| workflow | Python_Model_Inference |
| implementation_type | API_Doc |
| domain | Deep_Learning_Inference |
| scope | Image loading, resizing, normalization, and data format conversion for model input |
| source_file | docs/pymnn/cv.md (Python API reference) |
| related_patterns | OpenCV_Compatibility, Fused_Preprocessing, Tensor_Format_Conversion |
| last_updated | 2026-02-10 14:00 GMT |
Summary
This implementation documents the MNN Python APIs used for preprocessing raw image data into model-ready tensors. The primary functions are cv.imread for loading images, cv.resize for resizing with optional fused normalization, and expr.convert for data format conversion. These APIs are OpenCV-compatible in their interface design.
API Signatures
cv.imread
cv.imread(filename, flag=cv.IMREAD_COLOR) -> Var
Reads an image file from disk and returns it as a Var.
cv.imdecode
cv.imdecode(buf, flag=cv.IMREAD_COLOR) -> Var
Decodes an image from an in-memory buffer (ndarray, list, tuple, or bytes).
cv.resize
cv.resize(src, dsize, fx=0, fy=0, interpolation=cv.INTER_LINEAR, code=-1, mean=[], norm=[]) -> Var
Resizes an image. Optionally performs fused color conversion, mean subtraction, and normalization.
expr.convert
expr.convert(x, format) -> Var
Converts the data_format of a Var between NCHW, NHWC, and NC4HW4.
Parameters
cv.imread Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| filename | str | (required) | Path to the image file (JPEG, PNG, BMP, etc.) |
| flag | int | cv.IMREAD_COLOR | Read mode: cv.IMREAD_GRAYSCALE (grayscale uint8), cv.IMREAD_COLOR (BGR uint8), cv.IMREAD_ANYDEPTH (BGR float32) |
cv.resize Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| src | Var | (required) | Input image Var |
| dsize | tuple | (required) | Target size as (width, height) |
| fx | float | 0 | Horizontal scale factor; if 0, computed from dsize |
| fy | float | 0 | Vertical scale factor; if 0, computed from dsize |
| interpolation | int | cv.INTER_LINEAR | Interpolation method: INTER_NEAREST, INTER_LINEAR, INTER_CUBIC, INTER_AREA, INTER_LANCZOS4 |
| code | int | -1 | Color conversion code (e.g., cv.COLOR_BGR2RGB); -1 means no conversion |
| mean | [float] | [] | Per-channel mean subtraction values; also triggers cast to float32 |
| norm | [float] | [] | Per-channel normalization scale factors (multiplied after mean subtraction) |
expr.convert Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| x | Var | (required) | Input variable to convert |
| format | data_format | (required) | Target format: expr.NCHW, expr.NHWC, or expr.NC4HW4 |
Inputs
- Raw image file (JPEG, PNG, BMP) on disk, or image bytes in memory
- Alternatively, a numpy array or MNN Var containing raw pixel data
Outputs
- Preprocessed Var tensor in the target data format (typically NC4HW4), with:
- Shape: [1, C, H, W] (batch dimension added)
- dtype: float32
- Pixel values normalized according to the specified mean and norm parameters
Code Example
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr
# Step 1: Load image (returns Var with shape [H, W, 3], dtype uint8, NHWC)
image = cv.imread('cat.jpg')
# Step 2: Resize to 224x224 with fused normalization
# - Resizes to 224x224
# - Subtracts mean [103.94, 116.78, 123.68] per channel (BGR)
# - Multiplies by norm [0.017, 0.017, 0.017] per channel
# - Automatically casts to float32
image = cv.resize(image, (224, 224),
mean=[103.94, 116.78, 123.68],
norm=[0.017, 0.017, 0.017])
# Step 3: Add batch dimension: [224, 224, 3] -> [1, 224, 224, 3]
input_var = np.expand_dims(image, 0)
# Step 4: Convert from NHWC to NC4HW4 for the inference engine
input_var = expr.convert(input_var, expr.NC4HW4)
# input_var is now ready for model.forward(input_var)
Alternative: Manual Step-by-Step Preprocessing
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr
# Load image
img = cv.imread('cat.jpg')
# Resize without normalization
img = cv.resize(img, (224, 224))
# Manual type conversion and normalization
imgf = img.astype(np.float32)
imgf = (imgf - np.array([103.94, 116.78, 123.68])) * np.array([0.017, 0.017, 0.017])
# Add batch dimension and convert format
input_var = np.expand_dims(imgf, 0)
input_var = expr.convert(input_var, expr.NC4HW4)
Edge Cases and Limitations
- cv.imread is not available on mobile by default: Enable the PYMNN_IMGCODECS build flag to include image codec support on mobile platforms
- Do not use transpose for layout changes: Using numpy-style transpose to rearrange dimensions from HWC to CHW will produce incorrect results; always use expr.convert which handles the internal memory layout correctly
- Color order: cv.imread returns BGR images by default (matching OpenCV convention); use the code parameter in cv.resize or cv.cvtColor to convert to RGB if the model expects RGB input
- Empty mean/norm: When mean and norm are both empty (default), cv.resize does not cast to float32; the output remains uint8