Implementation:Alibaba MNN PyMNN Output Processing
| Field | Value |
|---|---|
| implementation_name | PyMNN_Output_Processing |
| schema_version | 0.1.0 |
| workflow | Python_Model_Inference |
| implementation_type | API_Doc |
| domain | Deep_Learning_Inference |
| scope | Converting raw inference outputs to interpretable results via format conversion, reduction, and data extraction |
| source_file | express/MathOp.cpp:L559-829 |
| related_patterns | Reduction_Operations, Softmax_Activation, Data_Format_Conversion |
| last_updated | 2026-02-10 14:00 GMT |
Summary
This implementation documents the MNN Python APIs used for postprocessing raw inference output tensors. The primary operations are expr.convert for data format conversion from NC4HW4 to NHWC/NCHW, np.argmax and expr.softmax for classification result extraction, and Var.read() for converting MNN Var objects to numpy arrays. The underlying C++ math operations are implemented in express/MathOp.cpp (reduction operations at lines 829-917, argmax at line 1062) and express/NeuralNetWorkOp.cpp (softmax at line 479).
API Signatures
expr.convert
expr.convert(x, format) -> Var
Converts the data_format of variable x to the specified format.
np.argmax (Var.argmax)
np.argmax(var) -> int
var.argmax(axis=[-1]) -> int
Returns the index of the maximum value. On Var objects, accepts an axis parameter.
expr.softmax
expr.softmax(x, axis=-1) -> Var
Applies softmax activation: exp(x) / sum(exp(x), axis).
Var.read
var.read() -> numpy.ndarray
Reads the Var data and returns it as a numpy ndarray. Requires PYMNN_NUMPY_USABLE.
Var.read_as_tuple
var.read_as_tuple() -> tuple
Reads the Var data and returns it as a flat Python tuple. Always available.
Parameters
expr.convert Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| x | Var | (required) | Input variable to convert (typically the raw inference output in NC4HW4) |
| format | data_format | (required) | Target format: expr.NCHW, expr.NHWC, or expr.NC4HW4 |
expr.softmax Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| x | Var | (required) | Input variable containing raw logits |
| axis | int | -1 | Axis along which to compute softmax (typically the class dimension) |
Var.argmax Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| axis | [int] | [-1] | Axis along which to find the maximum value index |
Inputs
- Raw output Var from the model's forward pass, typically in NC4HW4 format with model-specific shape
Outputs
- Post-processed results, which may be:
- Class index (int): The predicted class from argmax
- Probability distribution (Var/ndarray): Softmax output with per-class probabilities
- numpy.ndarray: Raw output data in a standard Python-accessible format
Code Example
Classification Postprocessing (Complete Pipeline)
import MNN.nn as nn
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr
# Load model and preprocess (see earlier pipeline steps)
net = nn.load_module_from_file('mobilenet_v1.mnn', ['data'], ['prob'])
image = cv.imread('cat.jpg')
image = cv.resize(image, (224, 224),
mean=[103.94, 116.78, 123.68],
norm=[0.017, 0.017, 0.017])
input_var = np.expand_dims(image, 0)
input_var = expr.convert(input_var, expr.NC4HW4)
# Execute inference
output_var = net.forward(input_var)
# Step 1: Convert from NC4HW4 to NHWC for postprocessing
output_var = expr.convert(output_var, expr.NHWC)
# Step 2: Get predicted class index
class_index = np.argmax(output_var)
print("Predicted class: {}".format(class_index)) # e.g., 282 (tabby cat)
Getting Probability Distribution
# If the model outputs logits (not probabilities), apply softmax first
output_var = expr.convert(output_var, expr.NHWC)
probabilities = expr.softmax(output_var, 1)
# Get top-1 class and confidence
class_index = np.argmax(probabilities)
confidence = probabilities.read_as_tuple()[class_index]
print("Class: {}, Confidence: {:.4f}".format(class_index, confidence))
Extracting to Numpy
# Convert output to numpy for further processing
output_var = expr.convert(output_var, expr.NHWC)
numpy_output = output_var.read() # Returns numpy.ndarray
print("Output shape:", numpy_output.shape)
print("Output dtype:", numpy_output.dtype)
# Use numpy operations for custom postprocessing
top5_indices = numpy_output.argsort()[0][-5:][::-1]
print("Top-5 classes:", top5_indices)
Using read_as_tuple (Portable)
# For environments without numpy support
output_var = expr.convert(output_var, expr.NHWC)
output_tuple = output_var.read_as_tuple() # Flat tuple of float values
max_idx = output_tuple.index(max(output_tuple))
print("Predicted class: {}".format(max_idx))
C++ Implementation Details
The reduction and math operations used in postprocessing are implemented in express/MathOp.cpp:
_ReduceSum (Line 829)
// express/MathOp.cpp:L829-831
VARP _ReduceSum(VARP input_variable, INTS axis, bool keepdims) {
return _Reduce(input_variable, axis, ReductionType_SUM, keepdims);
}
_ReduceMean (Line 853)
// express/MathOp.cpp:L853-855
VARP _ReduceMean(VARP input_variable, INTS axis, bool keepdims) {
return _Reduce(input_variable, axis, ReductionType_MEAN, keepdims);
}
_ReduceMax (Line 892)
// express/MathOp.cpp:L892-894
VARP _ReduceMax(VARP input_variable, INTS axis, bool keepdims) {
return _Reduce(input_variable, axis, ReductionType_MAXIMUM, keepdims);
}
_ArgMax (Line 1062)
// express/MathOp.cpp:L1062-1070
VARP _ArgMax(VARP input, int axis) {
input = _checkNC4HW4(input);
std::unique_ptr<OpT> op(new OpT);
op->main.type = OpParameter_ArgMax;
op->type = OpType_ArgMax;
op->main.value = new ArgMaxT;
op->main.AsArgMax()->axis = axis;
op->main.AsArgMax()->outMaxVal = 0;
// ...
}
Note that _ArgMax internally calls _checkNC4HW4 to handle format conversion before applying the operation, ensuring correct results even if the input is still in NC4HW4 format.
_Softmax (NeuralNetWorkOp.cpp Line 479)
// express/NeuralNetWorkOp.cpp:L479-484
VARP _Softmax(VARP logits, int axis) {
std::unique_ptr<OpT> softmax(new OpT);
softmax->type = OpType_Softmax;
softmax->main.type = OpParameter_Axis;
softmax->main.value = new AxisT;
softmax->main.AsAxis()->axis = axis;
// ...
}
Edge Cases and Limitations
- Format conversion is mandatory before read(): Calling read() on a NC4HW4 Var will return data in the internal padded layout, which does not match the logical tensor shape. Always convert to NHWC or NCHW first.
- read() requires PYMNN_NUMPY_USABLE: On mobile platforms where numpy is not available, use read_as_tuple() instead. The returned tuple is a flat sequence regardless of the tensor shape.
- argmax on multi-batch output: When processing a batch of inputs, argmax returns the index within the specified axis. Ensure the axis parameter is correct for the output shape.
- _checkNC4HW4 in reduction ops: Some C++ reduction operations internally convert NC4HW4 to NCHW before computing. While this makes them "safe" to call on NC4HW4 data, explicitly converting beforehand avoids unnecessary internal conversions.
- Softmax numerical stability: MNN's softmax implementation handles numerical stability internally by subtracting the maximum value before exponentiation.