Implementation:Alibaba MNN Express Utils
Metadata
| Source Repository | https://github.com/alibaba/MNN |
| Source Files | express/Utils.cpp (351 lines), express/Utils.hpp (93 lines)
|
| Language | C++ |
| Namespace | MNN::Express
|
| Domains | Tensor_Management, Execution |
| Last Updated | 2026-02-10 |
Summary
Utils is the core utility library for the MNN Express framework. It implements tensor type conversions, memory management, data format conversions (NCHW/NHWC/NC4HW4), and computation caching via DFS-based graph execution. The module provides the low-level plumbing that connects the Express API's high-level variable abstractions to MNN's internal tensor and session infrastructure.
Imports
#include "Utils.hpp"
#include <map>
#include <set>
#include <stack>
#include <MNN/expr/ExecutorScope.hpp>
#include "MNN_generated.h"
#include "core/TensorUtils.hpp"
#include "core/OpCommonUtils.hpp"
#include "core/Session.hpp"
#include "core/MNNMemoryUtils.h"
#include "core/Backend.hpp"
#include "core/Execution.hpp"
#include "core/ConvolutionCommon.hpp"
Key Classes and Structs
Expr::Inside (Utils.hpp L19-33)
Manages the internal tensor state for an expression node. Each Expr in the computation graph owns an Inside object that holds output tensor information, dirty flags, and a reference to the compute cache.
struct Expr::Inside {
Inside(int outputSize);
Inside(Tensor* tensor, bool own = false);
~ Inside();
std::vector<Variable::Info> mOutputInfos;
std::vector<Tensor*> mOutputTensors;
Executor::Requirement mReq;
std::shared_ptr<Executor::ComputeCache> mCache;
int mCacheOffset = 0;
bool mInfoDirty = true;
bool mContentDirty = true;
bool mOwnTensor = true;
Tensor* mHostTensor = nullptr;
std::shared_ptr<Backend> mHoldBackend;
};
Constructor (outputSize):
Expr::Inside::Inside(int outputSize) {
mOutputInfos.resize(outputSize);
mOutputTensors.resize(outputSize);
for (int i=0; i<outputSize; ++i) {
mOutputTensors[i] = new Tensor;
TensorUtils::getDescribe(mOutputTensors[i])->memoryType = Tensor::InsideDescribe::MEMORY_HOST;
}
}
Allocates outputSize tensors, each initialized with host memory type. This is the standard path for creating expression outputs.
Constructor (Tensor*, bool):
Expr::Inside::Inside(Tensor* tensor, bool own) {
mOutputInfos.resize(1);
mOutputTensors.resize(1);
mOutputTensors[0] = tensor;
Utils::copyTensorToInfo(&mOutputInfos[0], tensor);
mOutputInfos[0].syncSize();
mOwnTensor = own;
}
Wraps an existing tensor, optionally taking ownership. Used when an expression is created from a pre-existing tensor (e.g., model input).
Executor::ComputeCache (Utils.hpp L43-70)
Implements computation caching with DFS-based graph execution. This class manages a session and tracks dirty state for both shape and content, ensuring computations are only re-executed when inputs change.
class Executor::ComputeCache {
public:
void setContentDirty();
void* mapOutput(int offset, Tensor* dest);
~ ComputeCache();
ErrorCode compute();
ErrorCode resize();
ErrorCode resizeImpl();
Session* getSession() { return mSession.get(); }
friend class Executor;
private:
std::set<std::shared_ptr<Expr::Inside>> mInputInside;
std::set<std::shared_ptr<ComputeCache>> mInputs;
std::shared_ptr<Session> mSession;
bool mContentDirty = true;
bool mShapeDirty = true;
std::vector<std::shared_ptr<BufferStorage>> mCacheBuffers;
};
DFS-Based compute() (L229-296):
The compute() method uses an iterative DFS traversal of the compute cache graph to execute all dependent computations in topological order:
ErrorCode Executor::ComputeCache::compute() {
std::stack<ComputeCache*> dfsStack;
std::set<ComputeCache*> visited;
dfsStack.push(this);
// ... first pass: check for dirty inputs that would block execution
// ... second pass: execute sessions in dependency order
while (!dfsStack.empty()) {
auto cache = dfsStack.top();
// ... handle shape resizing if needed
if (!cache->mContentDirty) {
visited.insert(cache);
dfsStack.pop();
continue;
}
if (hasUnvisitInput(cache)) {
for (auto c : cache->mInputs) {
dfsStack.push(c.get());
}
} else {
visited.insert(cache);
dfsStack.pop();
code = cache->mSession->run();
cache->mContentDirty = false;
}
}
return NO_ERROR;
}
This two-pass approach first validates that no inputs have unresolvable dirty state (CALL_BACK_STOP), then executes each cache node's session in bottom-up order.
mapOutput() (L170-206):
void* Executor::ComputeCache::mapOutput(int offset, Tensor* dest) {
auto tensor = mSession->getTensor(offset);
auto des = TensorUtils::getDescribe(tensor);
if (0 == tensor->deviceId() && des->quantAttr.get() == nullptr) {
auto ptr = tensor->host<void>();
Utils::releaseMemoryForHostTensor(dest);
TensorUtils::getDescribe(dest)->memoryType = Tensor::InsideDescribe::MEMORY_BACKEND;
dest->buffer().host = (uint8_t*)ptr;
return ptr;
}
// ... fallback: copy from device to host
Utils::allocMemoryForHostTensor(dest);
if(nullptr != dest->host<void>()) {
tensor->copyToHostTensor(dest);
}
return dest->host<void>();
}
Maps a session output tensor to a destination tensor. When the data is already on the host and not quantized, it performs a zero-copy pointer reassignment. Otherwise, it allocates host memory and copies the data from the device.
Utils (Static Methods, Utils.hpp L71-83)
The Utils class provides static utility methods for conversions and memory management.
Format Conversion
int Utils::convertFormat(Dimensionformat format) {
CONVERT(NCHW, MNN_DATA_FORMAT_NCHW, format);
CONVERT(NHWC, MNN_DATA_FORMAT_NHWC, format);
CONVERT(NC4HW4, MNN_DATA_FORMAT_NC4HW4, format);
return MNN_DATA_FORMAT_UNKNOWN;
}
Express::Dimensionformat Utils::revertFormat(int format) {
CONVERT(MNN_DATA_FORMAT_NCHW, Express::NCHW, format);
CONVERT(MNN_DATA_FORMAT_NHWC, Express::NHWC, format);
CONVERT(MNN_DATA_FORMAT_NC4HW4, Express::NC4HW4, format);
return NCHW;
}
Bidirectional conversion between the Express API's Dimensionformat enum and MNN's internal MNN_DATA_FORMAT constants. Supports NCHW (channels-first), NHWC (channels-last), and NC4HW4 (4-channel packed) layouts.
Data Type Conversion
DataType Utils::convertDataType(halide_type_t type) {
return OpCommonUtils::convertDataType(type);
}
halide_type_t Utils::revertDataType(DataType dataType) {
CONVERT(DataType_DT_FLOAT, halide_type_of<float>(), dataType);
CONVERT(DataType_DT_INT32, halide_type_of<int32_t>(), dataType);
CONVERT(DataType_DT_INT64, halide_type_of<int32_t>(), dataType);
CONVERT(DataType_DT_UINT8, halide_type_of<uint8_t>(), dataType);
CONVERT(DataType_DT_INT8, halide_type_of<int8_t>(), dataType);
CONVERT(DataType_DT_HALF, halide_type_of<float>(), dataType);
CONVERT(DataType_DT_BFLOAT16, halide_type_t(halide_type_bfloat, 16), dataType);
return halide_type_of<float>();
}
Maps between MNN's DataType enum and Halide's halide_type_t. Note that INT64 is mapped to int32_t (a lossy conversion), and both HALF and BFLOAT16 are handled with appropriate type mappings.
Tensor Info Conversion
void Utils::copyInfoToTensor(Tensor* dest, const Variable::Info* source) {
if (nullptr == source) {
dest->buffer().dimensions = 0;
return;
}
for (int i = 0; i < source->dim.size(); ++i) {
dest->setLength(i, source->dim[i]);
}
dest->buffer().dimensions = (int)source->dim.size();
dest->buffer().type = source->type;
TensorUtils::getDescribe(dest)->dimensionFormat = (MNN_DATA_FORMAT)Utils::convertFormat(source->order);
TensorUtils::setLinearLayout(dest);
}
void Utils::copyTensorToInfo(Variable::Info* shape, const Tensor* tensor) {
shape->type = tensor->getType();
shape->dim = tensor->shape();
shape->size = tensor->elementSize();
shape->order = Utils::revertFormat(TensorUtils::getDescribe(tensor)->dimensionFormat);
}
Bidirectional conversion between Variable::Info (the Express API's shape/type descriptor) and Tensor (MNN's internal tensor representation).
Memory Management
bool Utils::allocMemoryForHostTensor(Tensor* dest) {
if (nullptr != dest->buffer().host) {
return true;
}
if (TensorUtils::getDescribe(dest)->memoryType != Tensor::InsideDescribe::MEMORY_HOST) {
return false;
}
auto size = dest->usize();
dest->buffer().host = (uint8_t*)MNNMemoryAllocAlign(size, MNN_MEMORY_ALIGN_DEFAULT);
return dest->buffer().host != nullptr;
}
bool Utils::releaseMemoryForHostTensor(Tensor* dest) {
if (nullptr == dest->buffer().host) {
return true;
}
if (TensorUtils::getDescribe(dest)->memoryType != Tensor::InsideDescribe::MEMORY_HOST) {
return false;
}
MNNMemoryFreeAlign(dest->buffer().host);
dest->buffer().host = nullptr;
return true;
}
Aligned memory allocation and deallocation for host tensors. Only operates on tensors with MEMORY_HOST type, refusing to touch backend-managed memory.
Variable-to-Tensor Extraction
Tensor* Utils::getTensor(VARP var) {
return (Tensor*)(var->getTensor());
}
Extracts the underlying Tensor* from a VARP (Variable pointer), bridging the Express API's variable abstraction to MNN's tensor layer.
Raster Operation Construction
EXPRP Utils::makeRaster(const std::vector<VARP>& vars, const std::vector<int>& regions,
const std::vector<int>& shape, halide_type_t dataType,
MNN_DATA_FORMAT format) {
std::unique_ptr<MNN::OpT> op(new MNN::OpT);
op->type = OpType_Raster;
// ... constructs Extra attributes for shape, region, data type, and format
auto expr = Expr::create(std::move(op), vars);
return expr;
}
Constructs an OpType_Raster expression node from input variables, region descriptors, and shape/type metadata. Raster operations enable memory layout transformations and tensor region copying.
I/O Contract
| Function | Input | Output |
|---|---|---|
copyInfoToTensor() |
Variable::Info* |
Populated Tensor* with matching shape, type, and format
|
copyTensorToInfo() |
Tensor* |
Populated Variable::Info* with matching shape, type, and format
|
convertFormat() |
Dimensionformat (NCHW/NHWC/NC4HW4) |
int (MNN internal format constant)
|
revertFormat() |
int (MNN internal format) |
Dimensionformat enum
|
convertDataType() |
halide_type_t |
DataType enum
|
revertDataType() |
DataType enum |
halide_type_t
|
allocMemoryForHostTensor() |
Tensor* |
bool (success), tensor host buffer allocated
|
releaseMemoryForHostTensor() |
Tensor* |
bool (success), tensor host buffer freed
|
getTensor() |
VARP |
Raw Tensor* pointer
|
makeRaster() |
Variables, regions, shape, type, format | EXPRP raster expression node
|
Related Pages
- Alibaba_MNN_Neural_Network_Inference -- Core execution support that Utils enables at the Express API layer