Implementation:Alibaba MNN SourceModule

Property	Value
Page Type	Implementation
Repository	Alibaba MNN
Source Files	`codegen/SourceModule.cpp` (179 lines), `codegen/SourceModule.hpp` (82 lines)
Language	C++
Domains	Code_Generation, GPU_Computing, Operator_Fusion
Date	2026-02-10

Overview

SourceModule is the GPU kernel source code generator within MNN's operator fusion codegen pipeline. It translates fused operator DAGs (directed acyclic graphs) into executable kernel source code strings. The module is responsible for:

Variable scoping — Managing the declaration and lifetime of intermediate variables within the generated kernel.
Topological sorting — Traversing the fused operator DAG in dependency order to ensure correct code emission sequence.
Constant folding — Identifying and pre-evaluating constant expressions at code generation time.
Target-specific code emission — Delegating platform-specific syntax generation to an abstract Target interface, enabling the same fusion logic to produce OpenCL, Vulkan, or Metal kernels.

The generated kernel code fuses multiple element-wise operations (BinaryOp, UnaryOp, ReLU, etc.) into a single GPU kernel launch, reducing memory bandwidth consumption and kernel dispatch overhead.

Code Reference

Source Files

File	Lines	Contents
`codegen/SourceModule.hpp`	82	Class declarations for `SourceModule`, `VarScope`, `Target` (abstract), and `Node`.
`codegen/SourceModule.cpp`	179	Implementation of `SourceModule::buildKernel`, `codegen`, and supporting methods.

Key Method

Method	Location	Description
`SourceModule::buildKernel`	`codegen/SourceModule.cpp:L91-162`	Core method that accepts a vector of fused operator nodes and produces the complete kernel source string. Performs topological traversal, variable allocation, and delegates per-node code emission to the `Target`.

Class Signatures

// codegen/SourceModule.hpp

class SourceModule {
public:
    // Build a complete kernel from a fused operator DAG
    std::string buildKernel(const std::vector<Node*>& nodes);

    // Generate code for the kernel body
    void codegen(const std::vector<Node*>& nodes);

    // Retrieve the generated kernel name
    std::string kernelName() const;

private:
    // Internal state for variable scoping, code buffer, etc.
};

class VarScope {
    // Manages variable declarations and scoping within generated code
};

class Target {
    // Abstract interface for target-specific code emission
    virtual std::string emitNode(const Node* node) = 0;
    virtual std::string emitPrologue() = 0;
    virtual std::string emitEpilogue() = 0;
};

class Node {
    // Represents a single operation in the fused operator DAG
};

I/O Contract

Inputs

Parameter	Type	Description
`nodes`	`std::vector<Node*>`	Topologically ordered list of operator nodes forming the fused DAG. Each `Node` represents a single element-wise operation (e.g., Add, Mul, ReLU, Sigmoid) with references to its input nodes.

Outputs

Output	Type	Description
Kernel source code	`std::string`	Complete GPU kernel source string ready for compilation by the target runtime (OpenCL, Vulkan SPIR-V compiler, Metal shader compiler).
`InOutTensors`	struct	Metadata describing the kernel's input and output tensor bindings, enabling the runtime to set up buffer arguments for kernel dispatch.

Architecture

Code Generation Pipeline

The buildKernel method implements the following pipeline:

DAG analysis — Walk the node graph to identify input tensors (sources with no predecessors) and output tensors (sinks with no successors).
Topological sort — Order nodes so that each node is emitted after all of its dependencies.
Variable allocation — Assign variable names to intermediate results using VarScope, reusing names when a value's last use has been emitted.
Prologue emission — Delegate to Target::emitPrologue() to generate kernel function signature, input buffer declarations, and thread index computation.
Body emission — Iterate through sorted nodes, calling Target::emitNode() for each to produce the operation's source code.
Epilogue emission — Delegate to Target::emitEpilogue() to generate output buffer writes and closing braces.
Assembly — Concatenate prologue, body, and epilogue into the final kernel source string.

Fusible Operations

The following operation types can be fused into a single generated kernel:

Operation Category	Examples	Fusion Benefit
UnaryOp	ReLU, Sigmoid, Tanh, Neg, Abs	Eliminates intermediate buffer write/read between activation and next operation
BinaryOp	Add, Mul, Sub, Div	Combines arithmetic chains into single-pass computation
Eltwise	Element-wise max, min	Reduces kernel launch count for simple pointwise ops

Usage Context

SourceModule is not invoked directly by users. It is called internally by MNN's operator fusion pass during model compilation. When the fusion pass identifies a chain of element-wise operations that can be merged, it constructs a Node DAG and passes it to SourceModule::buildKernel() to produce the fused kernel source.

The generated source is then compiled by the target GPU runtime:

OpenCL — Compiled via clCreateProgramWithSource + clBuildProgram
Vulkan — Compiled from GLSL to SPIR-V via glslang
Metal — Compiled via MTLDevice::newLibraryWithSource

Related Pages

Principle: Alibaba_MNN_Operator_Fusion_Codegen — Theoretical foundation of operator fusion and automated kernel code generation for reducing memory bandwidth and kernel launch overhead.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment