Implementation:Alibaba MNN PyMNN Output Processing

Field	Value
implementation_name	PyMNN_Output_Processing
schema_version	0.1.0
workflow	Python_Model_Inference
implementation_type	API_Doc
domain	Deep_Learning_Inference
scope	Converting raw inference outputs to interpretable results via format conversion, reduction, and data extraction
source_file	express/MathOp.cpp:L559-829
related_patterns	Reduction_Operations, Softmax_Activation, Data_Format_Conversion
last_updated	2026-02-10 14:00 GMT

Summary

This implementation documents the MNN Python APIs used for postprocessing raw inference output tensors. The primary operations are expr.convert for data format conversion from NC4HW4 to NHWC/NCHW, np.argmax and expr.softmax for classification result extraction, and Var.read() for converting MNN Var objects to numpy arrays. The underlying C++ math operations are implemented in express/MathOp.cpp (reduction operations at lines 829-917, argmax at line 1062) and express/NeuralNetWorkOp.cpp (softmax at line 479).

API Signatures

expr.convert

expr.convert(x, format) -> Var

Converts the data_format of variable x to the specified format.

np.argmax (Var.argmax)

np.argmax(var) -> int
var.argmax(axis=[-1]) -> int

Returns the index of the maximum value. On Var objects, accepts an axis parameter.

expr.softmax

expr.softmax(x, axis=-1) -> Var

Applies softmax activation: exp(x) / sum(exp(x), axis).

Var.read

var.read() -> numpy.ndarray

Reads the Var data and returns it as a numpy ndarray. Requires PYMNN_NUMPY_USABLE.

Var.read_as_tuple

var.read_as_tuple() -> tuple

Reads the Var data and returns it as a flat Python tuple. Always available.

Parameters

expr.convert Parameters

Parameter	Type	Default	Description
x	Var	(required)	Input variable to convert (typically the raw inference output in NC4HW4)
format	data_format	(required)	Target format: expr.NCHW, expr.NHWC, or expr.NC4HW4

expr.softmax Parameters

Parameter	Type	Default	Description
x	Var	(required)	Input variable containing raw logits
axis	int	-1	Axis along which to compute softmax (typically the class dimension)

Var.argmax Parameters

Parameter	Type	Default	Description
axis	[int]	[-1]	Axis along which to find the maximum value index

Inputs

Raw output Var from the model's forward pass, typically in NC4HW4 format with model-specific shape

Outputs

Post-processed results, which may be:
- Class index (int): The predicted class from argmax
- Probability distribution (Var/ndarray): Softmax output with per-class probabilities
- numpy.ndarray: Raw output data in a standard Python-accessible format

Code Example

Classification Postprocessing (Complete Pipeline)

import MNN.nn as nn
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr

# Load model and preprocess (see earlier pipeline steps)
net = nn.load_module_from_file('mobilenet_v1.mnn', ['data'], ['prob'])
image = cv.imread('cat.jpg')
image = cv.resize(image, (224, 224),
                  mean=[103.94, 116.78, 123.68],
                  norm=[0.017, 0.017, 0.017])
input_var = np.expand_dims(image, 0)
input_var = expr.convert(input_var, expr.NC4HW4)

# Execute inference
output_var = net.forward(input_var)

# Step 1: Convert from NC4HW4 to NHWC for postprocessing
output_var = expr.convert(output_var, expr.NHWC)

# Step 2: Get predicted class index
class_index = np.argmax(output_var)
print("Predicted class: {}".format(class_index))  # e.g., 282 (tabby cat)

Getting Probability Distribution

# If the model outputs logits (not probabilities), apply softmax first
output_var = expr.convert(output_var, expr.NHWC)
probabilities = expr.softmax(output_var, 1)

# Get top-1 class and confidence
class_index = np.argmax(probabilities)
confidence = probabilities.read_as_tuple()[class_index]
print("Class: {}, Confidence: {:.4f}".format(class_index, confidence))

Extracting to Numpy

# Convert output to numpy for further processing
output_var = expr.convert(output_var, expr.NHWC)
numpy_output = output_var.read()  # Returns numpy.ndarray
print("Output shape:", numpy_output.shape)
print("Output dtype:", numpy_output.dtype)

# Use numpy operations for custom postprocessing
top5_indices = numpy_output.argsort()[0][-5:][::-1]
print("Top-5 classes:", top5_indices)

Using read_as_tuple (Portable)

# For environments without numpy support
output_var = expr.convert(output_var, expr.NHWC)
output_tuple = output_var.read_as_tuple()  # Flat tuple of float values
max_idx = output_tuple.index(max(output_tuple))
print("Predicted class: {}".format(max_idx))

C++ Implementation Details

The reduction and math operations used in postprocessing are implemented in express/MathOp.cpp:

_ReduceSum (Line 829)

// express/MathOp.cpp:L829-831
VARP _ReduceSum(VARP input_variable, INTS axis, bool keepdims) {
    return _Reduce(input_variable, axis, ReductionType_SUM, keepdims);
}

_ReduceMean (Line 853)

// express/MathOp.cpp:L853-855
VARP _ReduceMean(VARP input_variable, INTS axis, bool keepdims) {
    return _Reduce(input_variable, axis, ReductionType_MEAN, keepdims);
}

_ReduceMax (Line 892)

// express/MathOp.cpp:L892-894
VARP _ReduceMax(VARP input_variable, INTS axis, bool keepdims) {
    return _Reduce(input_variable, axis, ReductionType_MAXIMUM, keepdims);
}

_ArgMax (Line 1062)

// express/MathOp.cpp:L1062-1070
VARP _ArgMax(VARP input, int axis) {
    input = _checkNC4HW4(input);
    std::unique_ptr<OpT> op(new OpT);
    op->main.type = OpParameter_ArgMax;
    op->type      = OpType_ArgMax;
    op->main.value = new ArgMaxT;
    op->main.AsArgMax()->axis = axis;
    op->main.AsArgMax()->outMaxVal = 0;
    // ...
}

Note that _ArgMax internally calls _checkNC4HW4 to handle format conversion before applying the operation, ensuring correct results even if the input is still in NC4HW4 format.

_Softmax (NeuralNetWorkOp.cpp Line 479)

// express/NeuralNetWorkOp.cpp:L479-484
VARP _Softmax(VARP logits, int axis) {
    std::unique_ptr<OpT> softmax(new OpT);
    softmax->type      = OpType_Softmax;
    softmax->main.type = OpParameter_Axis;
    softmax->main.value = new AxisT;
    softmax->main.AsAxis()->axis = axis;
    // ...
}

Edge Cases and Limitations

Format conversion is mandatory before read(): Calling read() on a NC4HW4 Var will return data in the internal padded layout, which does not match the logical tensor shape. Always convert to NHWC or NCHW first.
read() requires PYMNN_NUMPY_USABLE: On mobile platforms where numpy is not available, use read_as_tuple() instead. The returned tuple is a flat sequence regardless of the tensor shape.
argmax on multi-batch output: When processing a batch of inputs, argmax returns the index within the specified axis. Ensure the axis parameter is correct for the output shape.
_checkNC4HW4 in reduction ops: Some C++ reduction operations internally convert NC4HW4 to NCHW before computing. While this makes them "safe" to call on NC4HW4 data, explicitly converting beforehand avoids unnecessary internal conversions.
Softmax numerical stability: MNN's softmax implementation handles numerical stability internally by subtracting the maximum value before exponentiation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment