Principle:Alibaba MNN LLM Source Acquisition

Field	Value
principle_name	LLM_Source_Acquisition
repository	Alibaba_MNN
workflow	LLM_Deployment_Pipeline
pipeline_stage	Source Acquisition
principle_type	Conceptual
last_updated	2026-02-10 14:00 GMT

Overview

LLM Source Acquisition is the foundational step in the MNN LLM deployment pipeline. Before any on-device inference can occur, the pre-trained large language model weights, tokenizer files, and model configuration must be obtained from a model hub and prepared locally. This principle covers the theory and practice of acquiring LLM artifacts suitable for subsequent conversion to MNN format.

Theoretical Background

Modern large language models are distributed through model hubs such as HuggingFace Hub and ModelScope. These repositories contain all the artifacts necessary to reconstruct a model for inference:

Model weights: The learned parameters of the neural network, stored as safetensors (.safetensors) or PyTorch binary (.bin) files. These files are typically large (hundreds of megabytes to tens of gigabytes) and require Git LFS (Large File Storage) for proper download.
Tokenizer files: The vocabulary and encoding rules used to convert text into token sequences. Commonly stored as tokenizer.json, tokenizer_config.json, vocab.txt, or SentencePiece .model files.
Model configuration: A config.json file specifying the model architecture (hidden size, number of layers, number of attention heads, vocabulary size, etc.).

Supported Model Families

The MNN export pipeline supports a broad range of transformer-based LLM architectures. The model mapper in MNN (transformers/llm/export/utils/model_mapper.py) registers architecture-specific mappings for the following families:

Qwen family: Qwen, Qwen2, Qwen3, Qwen3-MoE, Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3-VL-MoE, Qwen2-Audio, Qwen2.5-Omni
Llama family: Llama, Llama-4-Text (and derivatives such as InternLM, MobileLLM)
Baichuan: Baichuan (with fused QKV projection via W_pack)
DeepSeek: DeepSeek-VL
ChatGLM family: ChatGLM (original), ChatGLM2/3/4
Phi family: Phi-MSFT, Phi-2/3
Gemma family: Gemma2, Gemma3, Gemma3-Text
Others: OpenELM, MiniCPM, MiniCPM-V, InternVL, Idefics3, SmolVLM, FastVLM, Hunyuan, MIMO, FunAudioChat, GPT-OSS

Any model whose config.json declares a model_type not explicitly registered will fall back to the default Llama-style mapping via AutoModelForCausalLM.

Acquisition Workflow

The acquisition process follows these steps:

Install Git LFS: Large model weight files are tracked by Git LFS. Without it, cloning will produce placeholder pointer files instead of actual weights.
Clone the model repository: Use git clone to download the full repository from HuggingFace or ModelScope.
Verify file integrity: After cloning, verify that weight files have realistic sizes (not a few hundred bytes, which would indicate LFS pointers were not resolved).
Identify model type: Check config.json for the model_type field to confirm it is supported by MNN's export pipeline.

Key Considerations

Storage requirements: Full-precision LLM weights can be very large. A 7B parameter model at float16 requires approximately 14 GB of storage. Ensure sufficient disk space before cloning.
Network bandwidth: Downloading large models requires stable, high-bandwidth connections. Consider using --depth 1 for shallow clones if full git history is not needed.
Model provenance: Both HuggingFace and ModelScope host the same model families. ModelScope (modelscope.cn) may offer faster downloads for users in China.
Version compatibility: The MNN export tool relies on HuggingFace Transformers library for model loading. Ensure the model's architecture is compatible with the installed version of transformers.

Related Pages

Implementation:Alibaba_MNN_HuggingFace_Model_Download
Principle:Alibaba_MNN_LLM_Model_Export - Next stage: converting acquired models to MNN format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment