Implementation:Alibaba MNN Express Utils

Metadata

Source Repository	https://github.com/alibaba/MNN
Source Files	`express/Utils.cpp` (351 lines), `express/Utils.hpp` (93 lines)
Language	C++
Namespace	`MNN::Express`
Domains	Tensor_Management, Execution
Last Updated	2026-02-10

Summary

Utils is the core utility library for the MNN Express framework. It implements tensor type conversions, memory management, data format conversions (NCHW/NHWC/NC4HW4), and computation caching via DFS-based graph execution. The module provides the low-level plumbing that connects the Express API's high-level variable abstractions to MNN's internal tensor and session infrastructure.

Imports

#include "Utils.hpp"
#include <map>
#include <set>
#include <stack>
#include <MNN/expr/ExecutorScope.hpp>
#include "MNN_generated.h"
#include "core/TensorUtils.hpp"
#include "core/OpCommonUtils.hpp"
#include "core/Session.hpp"
#include "core/MNNMemoryUtils.h"
#include "core/Backend.hpp"
#include "core/Execution.hpp"
#include "core/ConvolutionCommon.hpp"

Key Classes and Structs

Expr::Inside (Utils.hpp L19-33)

Manages the internal tensor state for an expression node. Each Expr in the computation graph owns an Inside object that holds output tensor information, dirty flags, and a reference to the compute cache.

struct Expr::Inside {
    Inside(int outputSize);
    Inside(Tensor* tensor, bool own = false);
    ~ Inside();
    std::vector<Variable::Info> mOutputInfos;
    std::vector<Tensor*> mOutputTensors;
    Executor::Requirement mReq;
    std::shared_ptr<Executor::ComputeCache> mCache;
    int mCacheOffset = 0;
    bool mInfoDirty = true;
    bool mContentDirty = true;
    bool mOwnTensor = true;
    Tensor* mHostTensor = nullptr;
    std::shared_ptr<Backend> mHoldBackend;
};

Constructor (outputSize):

Expr::Inside::Inside(int outputSize) {
    mOutputInfos.resize(outputSize);
    mOutputTensors.resize(outputSize);
    for (int i=0; i<outputSize; ++i) {
        mOutputTensors[i] = new Tensor;
        TensorUtils::getDescribe(mOutputTensors[i])->memoryType = Tensor::InsideDescribe::MEMORY_HOST;
    }
}

Allocates outputSize tensors, each initialized with host memory type. This is the standard path for creating expression outputs.

Constructor (Tensor*, bool):

Expr::Inside::Inside(Tensor* tensor, bool own) {
    mOutputInfos.resize(1);
    mOutputTensors.resize(1);
    mOutputTensors[0] = tensor;
    Utils::copyTensorToInfo(&mOutputInfos[0], tensor);
    mOutputInfos[0].syncSize();
    mOwnTensor = own;
}

Wraps an existing tensor, optionally taking ownership. Used when an expression is created from a pre-existing tensor (e.g., model input).

Executor::ComputeCache (Utils.hpp L43-70)

Implements computation caching with DFS-based graph execution. This class manages a session and tracks dirty state for both shape and content, ensuring computations are only re-executed when inputs change.

class Executor::ComputeCache {
public:
    void setContentDirty();
    void* mapOutput(int offset, Tensor* dest);
    ~ ComputeCache();
    ErrorCode compute();
    ErrorCode resize();
    ErrorCode resizeImpl();
    Session* getSession() { return mSession.get(); }
    friend class Executor;
private:
    std::set<std::shared_ptr<Expr::Inside>> mInputInside;
    std::set<std::shared_ptr<ComputeCache>> mInputs;
    std::shared_ptr<Session> mSession;
    bool mContentDirty = true;
    bool mShapeDirty = true;
    std::vector<std::shared_ptr<BufferStorage>> mCacheBuffers;
};

DFS-Based compute() (L229-296):

The compute() method uses an iterative DFS traversal of the compute cache graph to execute all dependent computations in topological order:

ErrorCode Executor::ComputeCache::compute() {
    std::stack<ComputeCache*> dfsStack;
    std::set<ComputeCache*> visited;
    dfsStack.push(this);
    // ... first pass: check for dirty inputs that would block execution
    // ... second pass: execute sessions in dependency order
    while (!dfsStack.empty()) {
        auto cache = dfsStack.top();
        // ... handle shape resizing if needed
        if (!cache->mContentDirty) {
            visited.insert(cache);
            dfsStack.pop();
            continue;
        }
        if (hasUnvisitInput(cache)) {
            for (auto c : cache->mInputs) {
                dfsStack.push(c.get());
            }
        } else {
            visited.insert(cache);
            dfsStack.pop();
            code = cache->mSession->run();
            cache->mContentDirty = false;
        }
    }
    return NO_ERROR;
}

This two-pass approach first validates that no inputs have unresolvable dirty state (CALL_BACK_STOP), then executes each cache node's session in bottom-up order.

mapOutput() (L170-206):

void* Executor::ComputeCache::mapOutput(int offset, Tensor* dest) {
    auto tensor = mSession->getTensor(offset);
    auto des = TensorUtils::getDescribe(tensor);
    if (0 == tensor->deviceId() && des->quantAttr.get() == nullptr) {
        auto ptr = tensor->host<void>();
        Utils::releaseMemoryForHostTensor(dest);
        TensorUtils::getDescribe(dest)->memoryType = Tensor::InsideDescribe::MEMORY_BACKEND;
        dest->buffer().host = (uint8_t*)ptr;
        return ptr;
    }
    // ... fallback: copy from device to host
    Utils::allocMemoryForHostTensor(dest);
    if(nullptr != dest->host<void>()) {
        tensor->copyToHostTensor(dest);
    }
    return dest->host<void>();
}

Maps a session output tensor to a destination tensor. When the data is already on the host and not quantized, it performs a zero-copy pointer reassignment. Otherwise, it allocates host memory and copies the data from the device.

Utils (Static Methods, Utils.hpp L71-83)

The Utils class provides static utility methods for conversions and memory management.

Format Conversion

int Utils::convertFormat(Dimensionformat format) {
    CONVERT(NCHW, MNN_DATA_FORMAT_NCHW, format);
    CONVERT(NHWC, MNN_DATA_FORMAT_NHWC, format);
    CONVERT(NC4HW4, MNN_DATA_FORMAT_NC4HW4, format);
    return MNN_DATA_FORMAT_UNKNOWN;
}

Express::Dimensionformat Utils::revertFormat(int format) {
    CONVERT(MNN_DATA_FORMAT_NCHW, Express::NCHW, format);
    CONVERT(MNN_DATA_FORMAT_NHWC, Express::NHWC, format);
    CONVERT(MNN_DATA_FORMAT_NC4HW4, Express::NC4HW4, format);
    return NCHW;
}

Bidirectional conversion between the Express API's Dimensionformat enum and MNN's internal MNN_DATA_FORMAT constants. Supports NCHW (channels-first), NHWC (channels-last), and NC4HW4 (4-channel packed) layouts.

Data Type Conversion

DataType Utils::convertDataType(halide_type_t type) {
    return OpCommonUtils::convertDataType(type);
}

halide_type_t Utils::revertDataType(DataType dataType) {
    CONVERT(DataType_DT_FLOAT, halide_type_of<float>(), dataType);
    CONVERT(DataType_DT_INT32, halide_type_of<int32_t>(), dataType);
    CONVERT(DataType_DT_INT64, halide_type_of<int32_t>(), dataType);
    CONVERT(DataType_DT_UINT8, halide_type_of<uint8_t>(), dataType);
    CONVERT(DataType_DT_INT8, halide_type_of<int8_t>(), dataType);
    CONVERT(DataType_DT_HALF, halide_type_of<float>(), dataType);
    CONVERT(DataType_DT_BFLOAT16, halide_type_t(halide_type_bfloat, 16), dataType);
    return halide_type_of<float>();
}

Maps between MNN's DataType enum and Halide's halide_type_t. Note that INT64 is mapped to int32_t (a lossy conversion), and both HALF and BFLOAT16 are handled with appropriate type mappings.

Tensor Info Conversion

void Utils::copyInfoToTensor(Tensor* dest, const Variable::Info* source) {
    if (nullptr == source) {
        dest->buffer().dimensions = 0;
        return;
    }
    for (int i = 0; i < source->dim.size(); ++i) {
        dest->setLength(i, source->dim[i]);
    }
    dest->buffer().dimensions = (int)source->dim.size();
    dest->buffer().type = source->type;
    TensorUtils::getDescribe(dest)->dimensionFormat = (MNN_DATA_FORMAT)Utils::convertFormat(source->order);
    TensorUtils::setLinearLayout(dest);
}

void Utils::copyTensorToInfo(Variable::Info* shape, const Tensor* tensor) {
    shape->type  = tensor->getType();
    shape->dim   = tensor->shape();
    shape->size  = tensor->elementSize();
    shape->order = Utils::revertFormat(TensorUtils::getDescribe(tensor)->dimensionFormat);
}

Bidirectional conversion between Variable::Info (the Express API's shape/type descriptor) and Tensor (MNN's internal tensor representation).

Memory Management

bool Utils::allocMemoryForHostTensor(Tensor* dest) {
    if (nullptr != dest->buffer().host) {
        return true;
    }
    if (TensorUtils::getDescribe(dest)->memoryType != Tensor::InsideDescribe::MEMORY_HOST) {
        return false;
    }
    auto size = dest->usize();
    dest->buffer().host = (uint8_t*)MNNMemoryAllocAlign(size, MNN_MEMORY_ALIGN_DEFAULT);
    return dest->buffer().host != nullptr;
}

bool Utils::releaseMemoryForHostTensor(Tensor* dest) {
    if (nullptr == dest->buffer().host) {
        return true;
    }
    if (TensorUtils::getDescribe(dest)->memoryType != Tensor::InsideDescribe::MEMORY_HOST) {
        return false;
    }
    MNNMemoryFreeAlign(dest->buffer().host);
    dest->buffer().host = nullptr;
    return true;
}

Aligned memory allocation and deallocation for host tensors. Only operates on tensors with MEMORY_HOST type, refusing to touch backend-managed memory.

Variable-to-Tensor Extraction

Tensor* Utils::getTensor(VARP var) {
    return (Tensor*)(var->getTensor());
}

Extracts the underlying Tensor* from a VARP (Variable pointer), bridging the Express API's variable abstraction to MNN's tensor layer.

Raster Operation Construction

EXPRP Utils::makeRaster(const std::vector<VARP>& vars, const std::vector<int>& regions,
                        const std::vector<int>& shape, halide_type_t dataType,
                        MNN_DATA_FORMAT format) {
    std::unique_ptr<MNN::OpT> op(new MNN::OpT);
    op->type = OpType_Raster;
    // ... constructs Extra attributes for shape, region, data type, and format
    auto expr = Expr::create(std::move(op), vars);
    return expr;
}

Constructs an OpType_Raster expression node from input variables, region descriptors, and shape/type metadata. Raster operations enable memory layout transformations and tensor region copying.

I/O Contract

Function	Input	Output
`copyInfoToTensor()`	`Variable::Info*`	Populated `Tensor*` with matching shape, type, and format
`copyTensorToInfo()`	`Tensor*`	Populated `Variable::Info*` with matching shape, type, and format
`convertFormat()`	`Dimensionformat` (NCHW/NHWC/NC4HW4)	`int` (MNN internal format constant)
`revertFormat()`	`int` (MNN internal format)	`Dimensionformat` enum
`convertDataType()`	`halide_type_t`	`DataType` enum
`revertDataType()`	`DataType` enum	`halide_type_t`
`allocMemoryForHostTensor()`	`Tensor*`	`bool` (success), tensor host buffer allocated
`releaseMemoryForHostTensor()`	`Tensor*`	`bool` (success), tensor host buffer freed
`getTensor()`	`VARP`	Raw `Tensor*` pointer
`makeRaster()`	Variables, regions, shape, type, format	`EXPRP` raster expression node

Related Pages

Alibaba_MNN_Neural_Network_Inference -- Core execution support that Utils enables at the Express API layer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment