Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN SourceModule

From Leeroopedia


Property Value
Page Type Implementation
Repository Alibaba MNN
Source Files codegen/SourceModule.cpp (179 lines), codegen/SourceModule.hpp (82 lines)
Language C++
Domains Code_Generation, GPU_Computing, Operator_Fusion
Date 2026-02-10

Overview

SourceModule is the GPU kernel source code generator within MNN's operator fusion codegen pipeline. It translates fused operator DAGs (directed acyclic graphs) into executable kernel source code strings. The module is responsible for:

  • Variable scoping — Managing the declaration and lifetime of intermediate variables within the generated kernel.
  • Topological sorting — Traversing the fused operator DAG in dependency order to ensure correct code emission sequence.
  • Constant folding — Identifying and pre-evaluating constant expressions at code generation time.
  • Target-specific code emission — Delegating platform-specific syntax generation to an abstract Target interface, enabling the same fusion logic to produce OpenCL, Vulkan, or Metal kernels.

The generated kernel code fuses multiple element-wise operations (BinaryOp, UnaryOp, ReLU, etc.) into a single GPU kernel launch, reducing memory bandwidth consumption and kernel dispatch overhead.

Code Reference

Source Files

File Lines Contents
codegen/SourceModule.hpp 82 Class declarations for SourceModule, VarScope, Target (abstract), and Node.
codegen/SourceModule.cpp 179 Implementation of SourceModule::buildKernel, codegen, and supporting methods.

Key Method

Method Location Description
SourceModule::buildKernel codegen/SourceModule.cpp:L91-162 Core method that accepts a vector of fused operator nodes and produces the complete kernel source string. Performs topological traversal, variable allocation, and delegates per-node code emission to the Target.

Class Signatures

// codegen/SourceModule.hpp

class SourceModule {
public:
    // Build a complete kernel from a fused operator DAG
    std::string buildKernel(const std::vector<Node*>& nodes);

    // Generate code for the kernel body
    void codegen(const std::vector<Node*>& nodes);

    // Retrieve the generated kernel name
    std::string kernelName() const;

private:
    // Internal state for variable scoping, code buffer, etc.
};

class VarScope {
    // Manages variable declarations and scoping within generated code
};

class Target {
    // Abstract interface for target-specific code emission
    virtual std::string emitNode(const Node* node) = 0;
    virtual std::string emitPrologue() = 0;
    virtual std::string emitEpilogue() = 0;
};

class Node {
    // Represents a single operation in the fused operator DAG
};

I/O Contract

Inputs

Parameter Type Description
nodes std::vector<Node*> Topologically ordered list of operator nodes forming the fused DAG. Each Node represents a single element-wise operation (e.g., Add, Mul, ReLU, Sigmoid) with references to its input nodes.

Outputs

Output Type Description
Kernel source code std::string Complete GPU kernel source string ready for compilation by the target runtime (OpenCL, Vulkan SPIR-V compiler, Metal shader compiler).
InOutTensors struct Metadata describing the kernel's input and output tensor bindings, enabling the runtime to set up buffer arguments for kernel dispatch.

Architecture

Code Generation Pipeline

The buildKernel method implements the following pipeline:

  1. DAG analysis — Walk the node graph to identify input tensors (sources with no predecessors) and output tensors (sinks with no successors).
  2. Topological sort — Order nodes so that each node is emitted after all of its dependencies.
  3. Variable allocation — Assign variable names to intermediate results using VarScope, reusing names when a value's last use has been emitted.
  4. Prologue emission — Delegate to Target::emitPrologue() to generate kernel function signature, input buffer declarations, and thread index computation.
  5. Body emission — Iterate through sorted nodes, calling Target::emitNode() for each to produce the operation's source code.
  6. Epilogue emission — Delegate to Target::emitEpilogue() to generate output buffer writes and closing braces.
  7. Assembly — Concatenate prologue, body, and epilogue into the final kernel source string.

Fusible Operations

The following operation types can be fused into a single generated kernel:

Operation Category Examples Fusion Benefit
UnaryOp ReLU, Sigmoid, Tanh, Neg, Abs Eliminates intermediate buffer write/read between activation and next operation
BinaryOp Add, Mul, Sub, Div Combines arithmetic chains into single-pass computation
Eltwise Element-wise max, min Reduces kernel launch count for simple pointwise ops

Usage Context

SourceModule is not invoked directly by users. It is called internally by MNN's operator fusion pass during model compilation. When the fusion pass identifies a chain of element-wise operations that can be merged, it constructs a Node DAG and passes it to SourceModule::buildKernel() to produce the fused kernel source.

The generated source is then compiled by the target GPU runtime:

  • OpenCL — Compiled via clCreateProgramWithSource + clBuildProgram
  • Vulkan — Compiled from GLSL to SPIR-V via glslang
  • Metal — Compiled via MTLDevice::newLibraryWithSource

Related Pages

  • Principle: Alibaba_MNN_Operator_Fusion_Codegen — Theoretical foundation of operator fusion and automated kernel code generation for reducing memory bandwidth and kernel launch overhead.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment