Implementation:Alibaba MNN PyMNN Module Forward

Field	Value
implementation_name	PyMNN_Module_Forward
schema_version	0.1.0
workflow	Python_Model_Inference
implementation_type	API_Doc
domain	Deep_Learning_Inference
scope	Executing a forward pass through a loaded _Module to obtain inference results
source_file	express/Expr.cpp:L308-434
related_patterns	Computational_Graph_Evaluation, Operator_Dispatch, Shape_Inference
last_updated	2026-02-10 14:00 GMT

Summary

This implementation documents the _Module.forward() and _Module.__call__() APIs, which execute a neural network forward pass in MNN. The input is a preprocessed Var (typically in NC4HW4 format) and the output is a Var containing the raw inference results. The underlying graph evaluation logic is implemented in express/Expr.cpp, where Expr::requireInfo() (line 308) handles shape inference and Variable::create (line 373) manages variable creation with optional lazy evaluation and geometry decomposition.

API Signatures

_Module.forward

module.forward(input) -> Var

Executes a forward pass through the model. Accepts a single Var or a list of Vars as input.

_Module.call

module(input) -> Var

Equivalent to module.forward(input). Provides a callable syntax for convenience.

_Module.onForward

module.onForward(input) -> [Var]

Executes a forward pass and returns multiple output Vars (for multi-output models).

Parameters

Parameter	Type	Default	Description
input	Var or [Var]	(required)	Preprocessed input variable(s). For single-input models, pass a Var in NC4HW4 format with shape [N, C, H, W]. For multi-input models, pass a list of Vars.

Inputs

Preprocessed Var -- A tensor in NC4HW4 (preferred) or NCHW format, with the correct shape and dtype (typically float32) for the model. This comes from the preprocessing step.
Loaded _Module -- A model loaded via nn.load_module_from_file or nn.load_module, bound to a configured runtime.

Outputs

Output Var -- A tensor containing the raw inference results. The shape depends on the model:
- Classification models: [1, num_classes] (after format conversion)
- Detection models: [1, num_detections, detection_fields]
- Segmentation models: [1, num_classes, H, W]
The output is typically in NC4HW4 format and must be converted to NHWC or NCHW for postprocessing.

Code Example

import MNN.nn as nn
import MNN.cv as cv
import MNN.numpy as np
import MNN.expr as expr

# Load model
net = nn.load_module_from_file('mobilenet_v1.mnn', ['data'], ['prob'])

# Preprocess image
image = cv.imread('cat.jpg')
image = cv.resize(image, (224, 224),
                  mean=[103.94, 116.78, 123.68],
                  norm=[0.017, 0.017, 0.017])
input_var = np.expand_dims(image, 0)
input_var = expr.convert(input_var, expr.NC4HW4)

# Execute forward pass (two equivalent ways)
output_var = net.forward(input_var)
# or equivalently:
# output_var = net(input_var)

# output_var is in NC4HW4 format, needs conversion for postprocessing
print(output_var.shape)  # e.g., [1, 1001]

Multi-Output Model Example

# For models with multiple outputs, use onForward
outputs = net.onForward(input_var)
for i, out in enumerate(outputs):
    print(f"Output {i} shape: {out.shape}")

Custom Module with Forward

import MNN
import MNN.nn as nn
import MNN.expr as expr

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.conv(1, 20, [5, 5])
        self.conv2 = nn.conv(20, 50, [5, 5])
        self.fc1 = nn.linear(800, 500)
        self.fc2 = nn.linear(500, 10)

    def forward(self, x):
        x = expr.relu(self.conv1(x))
        x = expr.max_pool(x, [2, 2], [2, 2])
        x = expr.relu(self.conv2(x))
        x = expr.max_pool(x, [2, 2], [2, 2])
        x = expr.convert(x, expr.NCHW)
        x = expr.reshape(x, [0, -1])
        x = expr.relu(self.fc1(x))
        x = self.fc2(x)
        x = expr.softmax(x, 1)
        return x

C++ Implementation Details

The graph evaluation logic underlying forward() is implemented in express/Expr.cpp:

Expr::requireInfo (Line 308-364)

Shape inference method that recursively resolves input shapes before computation:

// express/Expr.cpp:L308-364 (simplified)
bool Expr::requireInfo() {
    if (!mInside->mInfoDirty) {
        return true;  // Shape already computed
    }
    // Validate all inputs have known shapes
    for (int i = 0; i < mInputs.size(); ++i) {
        auto inputInfo = mInputs[i]->getInfo();
        if (nullptr == inputInfo) {
            mValid = false;
            return false;  // Input not ready
        }
    }
    // Check if shape-dependent inputs have content
    for (int i = 0; i < mInputs.size(); ++i) {
        if (mInside->mReq.shapeNeedContent[i]) {
            auto ptr = mInputs[i]->readInternal(true);
            if (nullptr == ptr) {
                return false;  // Content needed for shape
            }
        }
    }
    // Compute output shapes
    auto res = ExecutorScope::Current()->computeInfo(this);
    if (NO_ERROR == res) {
        mInside->mInfoDirty = false;
    }
    return NO_ERROR == res;
}

Variable::create (Line 373-439)

Creates output variables with optional lazy evaluation and geometry decomposition:

// express/Expr.cpp:L373-390 (simplified)
VARP Variable::create(EXPRP expr, int index) {
    VARP res(new Variable(expr, index));
    auto executor = ExecutorScope::Current();
    if (!executor->lazyEval) {
        res.fix(VARP::CONSTANT);  // Eager: compute immediately
        return res;
    }
    // Lazy mode: defer computation
    // CONTENT mode: decompose via GeometryComputer
    // ...
    return res;
}

Edge Cases and Limitations

Input shape mismatch: If the input Var's shape does not match the model's expected input shape, Expr::requireInfo() will fail and the forward pass will produce an invalid output Var
NC4HW4 channel padding: When the number of channels is not a multiple of 4, NC4HW4 format pads the channels. This is handled transparently by MNN but affects memory layout
Dynamic shapes: For models with dynamic input shapes, set dynamic=True and shape_mutable=True when loading the module
GPU synchronization: On GPU backends, the forward() call may return before computation finishes. Accessing the output Var's data (via read()) triggers synchronization
Memory reuse: MNN reuses intermediate tensor memory across forward calls for efficiency. Do not hold references to intermediate Vars across forward calls

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment