Implementation:Alibaba MNN PyMNN Module Forward
| Field | Value |
|---|---|
| implementation_name | PyMNN_Module_Forward |
| schema_version | 0.1.0 |
| workflow | Python_Model_Inference |
| implementation_type | API_Doc |
| domain | Deep_Learning_Inference |
| scope | Executing a forward pass through a loaded _Module to obtain inference results |
| source_file | express/Expr.cpp:L308-434 |
| related_patterns | Computational_Graph_Evaluation, Operator_Dispatch, Shape_Inference |
| last_updated | 2026-02-10 14:00 GMT |
Summary
This implementation documents the _Module.forward() and _Module.__call__() APIs, which execute a neural network forward pass in MNN. The input is a preprocessed Var (typically in NC4HW4 format) and the output is a Var containing the raw inference results. The underlying graph evaluation logic is implemented in express/Expr.cpp, where Expr::requireInfo() (line 308) handles shape inference and Variable::create (line 373) manages variable creation with optional lazy evaluation and geometry decomposition.
API Signatures
_Module.forward
module.forward(input) -> Var
Executes a forward pass through the model. Accepts a single Var or a list of Vars as input.
_Module.__call__
module(input) -> Var
Equivalent to module.forward(input). Provides a callable syntax for convenience.
_Module.onForward
module.onForward(input) -> [Var]
Executes a forward pass and returns multiple output Vars (for multi-output models).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| input | Var or [Var] | (required) | Preprocessed input variable(s). For single-input models, pass a Var in NC4HW4 format with shape [N, C, H, W]. For multi-input models, pass a list of Vars. |
Inputs
- Preprocessed Var -- A tensor in NC4HW4 (preferred) or NCHW format, with the correct shape and dtype (typically float32) for the model. This comes from the preprocessing step.
- Loaded _Module -- A model loaded via nn.load_module_from_file or nn.load_module, bound to a configured runtime.
Outputs
- Output Var -- A tensor containing the raw inference results. The shape depends on the model:
- Classification models: [1, num_classes] (after format conversion)
- Detection models: [1, num_detections, detection_fields]
- Segmentation models: [1, num_classes, H, W]
- The output is typically in NC4HW4 format and must be converted to NHWC or NCHW for postprocessing.
Code Example
import MNN.nn as nn
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr
# Load model
net = nn.load_module_from_file('mobilenet_v1.mnn', ['data'], ['prob'])
# Preprocess image
image = cv.imread('cat.jpg')
image = cv.resize(image, (224, 224),
mean=[103.94, 116.78, 123.68],
norm=[0.017, 0.017, 0.017])
input_var = np.expand_dims(image, 0)
input_var = expr.convert(input_var, expr.NC4HW4)
# Execute forward pass (two equivalent ways)
output_var = net.forward(input_var)
# or equivalently:
# output_var = net(input_var)
# output_var is in NC4HW4 format, needs conversion for postprocessing
print(output_var.shape) # e.g., [1, 1001]
Multi-Output Model Example
# For models with multiple outputs, use onForward
outputs = net.onForward(input_var)
for i, out in enumerate(outputs):
print(f"Output {i} shape: {out.shape}")
Custom Module with Forward
import MNN
import MNN.nn as nn
import MNN.expr as expr
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.conv(1, 20, [5, 5])
self.conv2 = nn.conv(20, 50, [5, 5])
self.fc1 = nn.linear(800, 500)
self.fc2 = nn.linear(500, 10)
def forward(self, x):
x = expr.relu(self.conv1(x))
x = expr.max_pool(x, [2, 2], [2, 2])
x = expr.relu(self.conv2(x))
x = expr.max_pool(x, [2, 2], [2, 2])
x = expr.convert(x, expr.NCHW)
x = expr.reshape(x, [0, -1])
x = expr.relu(self.fc1(x))
x = self.fc2(x)
x = expr.softmax(x, 1)
return x
C++ Implementation Details
The graph evaluation logic underlying forward() is implemented in express/Expr.cpp:
Expr::requireInfo (Line 308-364)
Shape inference method that recursively resolves input shapes before computation:
// express/Expr.cpp:L308-364 (simplified)
bool Expr::requireInfo() {
if (!mInside->mInfoDirty) {
return true; // Shape already computed
}
// Validate all inputs have known shapes
for (int i = 0; i < mInputs.size(); ++i) {
auto inputInfo = mInputs[i]->getInfo();
if (nullptr == inputInfo) {
mValid = false;
return false; // Input not ready
}
}
// Check if shape-dependent inputs have content
for (int i = 0; i < mInputs.size(); ++i) {
if (mInside->mReq.shapeNeedContent[i]) {
auto ptr = mInputs[i]->readInternal(true);
if (nullptr == ptr) {
return false; // Content needed for shape
}
}
}
// Compute output shapes
auto res = ExecutorScope::Current()->computeInfo(this);
if (NO_ERROR == res) {
mInside->mInfoDirty = false;
}
return NO_ERROR == res;
}
Variable::create (Line 373-439)
Creates output variables with optional lazy evaluation and geometry decomposition:
// express/Expr.cpp:L373-390 (simplified)
VARP Variable::create(EXPRP expr, int index) {
VARP res(new Variable(expr, index));
auto executor = ExecutorScope::Current();
if (!executor->lazyEval) {
res.fix(VARP::CONSTANT); // Eager: compute immediately
return res;
}
// Lazy mode: defer computation
// CONTENT mode: decompose via GeometryComputer
// ...
return res;
}
Edge Cases and Limitations
- Input shape mismatch: If the input Var's shape does not match the model's expected input shape, Expr::requireInfo() will fail and the forward pass will produce an invalid output Var
- NC4HW4 channel padding: When the number of channels is not a multiple of 4, NC4HW4 format pads the channels. This is handled transparently by MNN but affects memory layout
- Dynamic shapes: For models with dynamic input shapes, set dynamic=True and shape_mutable=True when loading the module
- GPU synchronization: On GPU backends, the forward() call may return before computation finishes. Accessing the output Var's data (via read()) triggers synchronization
- Memory reuse: MNN reuses intermediate tensor memory across forward calls for efficiency. Do not hold references to intermediate Vars across forward calls