Implementation:Alibaba MNN SourceModule
| Property | Value |
|---|---|
| Page Type | Implementation |
| Repository | Alibaba MNN |
| Source Files | codegen/SourceModule.cpp (179 lines), codegen/SourceModule.hpp (82 lines)
|
| Language | C++ |
| Domains | Code_Generation, GPU_Computing, Operator_Fusion |
| Date | 2026-02-10 |
Overview
SourceModule is the GPU kernel source code generator within MNN's operator fusion codegen pipeline. It translates fused operator DAGs (directed acyclic graphs) into executable kernel source code strings. The module is responsible for:
- Variable scoping — Managing the declaration and lifetime of intermediate variables within the generated kernel.
- Topological sorting — Traversing the fused operator DAG in dependency order to ensure correct code emission sequence.
- Constant folding — Identifying and pre-evaluating constant expressions at code generation time.
- Target-specific code emission — Delegating platform-specific syntax generation to an abstract
Targetinterface, enabling the same fusion logic to produce OpenCL, Vulkan, or Metal kernels.
The generated kernel code fuses multiple element-wise operations (BinaryOp, UnaryOp, ReLU, etc.) into a single GPU kernel launch, reducing memory bandwidth consumption and kernel dispatch overhead.
Code Reference
Source Files
| File | Lines | Contents |
|---|---|---|
codegen/SourceModule.hpp |
82 | Class declarations for SourceModule, VarScope, Target (abstract), and Node.
|
codegen/SourceModule.cpp |
179 | Implementation of SourceModule::buildKernel, codegen, and supporting methods.
|
Key Method
| Method | Location | Description |
|---|---|---|
SourceModule::buildKernel |
codegen/SourceModule.cpp:L91-162 |
Core method that accepts a vector of fused operator nodes and produces the complete kernel source string. Performs topological traversal, variable allocation, and delegates per-node code emission to the Target.
|
Class Signatures
// codegen/SourceModule.hpp
class SourceModule {
public:
// Build a complete kernel from a fused operator DAG
std::string buildKernel(const std::vector<Node*>& nodes);
// Generate code for the kernel body
void codegen(const std::vector<Node*>& nodes);
// Retrieve the generated kernel name
std::string kernelName() const;
private:
// Internal state for variable scoping, code buffer, etc.
};
class VarScope {
// Manages variable declarations and scoping within generated code
};
class Target {
// Abstract interface for target-specific code emission
virtual std::string emitNode(const Node* node) = 0;
virtual std::string emitPrologue() = 0;
virtual std::string emitEpilogue() = 0;
};
class Node {
// Represents a single operation in the fused operator DAG
};
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
nodes |
std::vector<Node*> |
Topologically ordered list of operator nodes forming the fused DAG. Each Node represents a single element-wise operation (e.g., Add, Mul, ReLU, Sigmoid) with references to its input nodes.
|
Outputs
| Output | Type | Description |
|---|---|---|
| Kernel source code | std::string |
Complete GPU kernel source string ready for compilation by the target runtime (OpenCL, Vulkan SPIR-V compiler, Metal shader compiler). |
InOutTensors |
struct | Metadata describing the kernel's input and output tensor bindings, enabling the runtime to set up buffer arguments for kernel dispatch. |
Architecture
Code Generation Pipeline
The buildKernel method implements the following pipeline:
- DAG analysis — Walk the node graph to identify input tensors (sources with no predecessors) and output tensors (sinks with no successors).
- Topological sort — Order nodes so that each node is emitted after all of its dependencies.
- Variable allocation — Assign variable names to intermediate results using
VarScope, reusing names when a value's last use has been emitted. - Prologue emission — Delegate to
Target::emitPrologue()to generate kernel function signature, input buffer declarations, and thread index computation. - Body emission — Iterate through sorted nodes, calling
Target::emitNode()for each to produce the operation's source code. - Epilogue emission — Delegate to
Target::emitEpilogue()to generate output buffer writes and closing braces. - Assembly — Concatenate prologue, body, and epilogue into the final kernel source string.
Fusible Operations
The following operation types can be fused into a single generated kernel:
| Operation Category | Examples | Fusion Benefit |
|---|---|---|
| UnaryOp | ReLU, Sigmoid, Tanh, Neg, Abs | Eliminates intermediate buffer write/read between activation and next operation |
| BinaryOp | Add, Mul, Sub, Div | Combines arithmetic chains into single-pass computation |
| Eltwise | Element-wise max, min | Reduces kernel launch count for simple pointwise ops |
Usage Context
SourceModule is not invoked directly by users. It is called internally by MNN's operator fusion pass during model compilation. When the fusion pass identifies a chain of element-wise operations that can be merged, it constructs a Node DAG and passes it to SourceModule::buildKernel() to produce the fused kernel source.
The generated source is then compiled by the target GPU runtime:
- OpenCL — Compiled via
clCreateProgramWithSource+clBuildProgram - Vulkan — Compiled from GLSL to SPIR-V via
glslang - Metal — Compiled via
MTLDevice::newLibraryWithSource
Related Pages
- Principle: Alibaba_MNN_Operator_Fusion_Codegen — Theoretical foundation of operator fusion and automated kernel code generation for reducing memory bandwidth and kernel launch overhead.