Principle:Alibaba MNN LLM Source Acquisition
| Field | Value |
|---|---|
| principle_name | LLM_Source_Acquisition |
| repository | Alibaba_MNN |
| workflow | LLM_Deployment_Pipeline |
| pipeline_stage | Source Acquisition |
| principle_type | Conceptual |
| last_updated | 2026-02-10 14:00 GMT |
Overview
LLM Source Acquisition is the foundational step in the MNN LLM deployment pipeline. Before any on-device inference can occur, the pre-trained large language model weights, tokenizer files, and model configuration must be obtained from a model hub and prepared locally. This principle covers the theory and practice of acquiring LLM artifacts suitable for subsequent conversion to MNN format.
Theoretical Background
Modern large language models are distributed through model hubs such as HuggingFace Hub and ModelScope. These repositories contain all the artifacts necessary to reconstruct a model for inference:
- Model weights: The learned parameters of the neural network, stored as safetensors (
.safetensors) or PyTorch binary (.bin) files. These files are typically large (hundreds of megabytes to tens of gigabytes) and require Git LFS (Large File Storage) for proper download. - Tokenizer files: The vocabulary and encoding rules used to convert text into token sequences. Commonly stored as
tokenizer.json,tokenizer_config.json,vocab.txt, or SentencePiece.modelfiles. - Model configuration: A
config.jsonfile specifying the model architecture (hidden size, number of layers, number of attention heads, vocabulary size, etc.).
Supported Model Families
The MNN export pipeline supports a broad range of transformer-based LLM architectures. The model mapper in MNN (transformers/llm/export/utils/model_mapper.py) registers architecture-specific mappings for the following families:
- Qwen family: Qwen, Qwen2, Qwen3, Qwen3-MoE, Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3-VL-MoE, Qwen2-Audio, Qwen2.5-Omni
- Llama family: Llama, Llama-4-Text (and derivatives such as InternLM, MobileLLM)
- Baichuan: Baichuan (with fused QKV projection via
W_pack) - DeepSeek: DeepSeek-VL
- ChatGLM family: ChatGLM (original), ChatGLM2/3/4
- Phi family: Phi-MSFT, Phi-2/3
- Gemma family: Gemma2, Gemma3, Gemma3-Text
- Others: OpenELM, MiniCPM, MiniCPM-V, InternVL, Idefics3, SmolVLM, FastVLM, Hunyuan, MIMO, FunAudioChat, GPT-OSS
Any model whose config.json declares a model_type not explicitly registered will fall back to the default Llama-style mapping via AutoModelForCausalLM.
Acquisition Workflow
The acquisition process follows these steps:
- Install Git LFS: Large model weight files are tracked by Git LFS. Without it, cloning will produce placeholder pointer files instead of actual weights.
- Clone the model repository: Use
git cloneto download the full repository from HuggingFace or ModelScope. - Verify file integrity: After cloning, verify that weight files have realistic sizes (not a few hundred bytes, which would indicate LFS pointers were not resolved).
- Identify model type: Check
config.jsonfor themodel_typefield to confirm it is supported by MNN's export pipeline.
Key Considerations
- Storage requirements: Full-precision LLM weights can be very large. A 7B parameter model at float16 requires approximately 14 GB of storage. Ensure sufficient disk space before cloning.
- Network bandwidth: Downloading large models requires stable, high-bandwidth connections. Consider using
--depth 1for shallow clones if full git history is not needed. - Model provenance: Both HuggingFace and ModelScope host the same model families. ModelScope (
modelscope.cn) may offer faster downloads for users in China. - Version compatibility: The MNN export tool relies on HuggingFace Transformers library for model loading. Ensure the model's architecture is compatible with the installed version of
transformers.
Related Pages
- Implementation:Alibaba_MNN_HuggingFace_Model_Download
- Principle:Alibaba_MNN_LLM_Model_Export - Next stage: converting acquired models to MNN format