Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba MNN LLM Source Acquisition

From Leeroopedia


Field Value
principle_name LLM_Source_Acquisition
repository Alibaba_MNN
workflow LLM_Deployment_Pipeline
pipeline_stage Source Acquisition
principle_type Conceptual
last_updated 2026-02-10 14:00 GMT

Overview

LLM Source Acquisition is the foundational step in the MNN LLM deployment pipeline. Before any on-device inference can occur, the pre-trained large language model weights, tokenizer files, and model configuration must be obtained from a model hub and prepared locally. This principle covers the theory and practice of acquiring LLM artifacts suitable for subsequent conversion to MNN format.

Theoretical Background

Modern large language models are distributed through model hubs such as HuggingFace Hub and ModelScope. These repositories contain all the artifacts necessary to reconstruct a model for inference:

  • Model weights: The learned parameters of the neural network, stored as safetensors (.safetensors) or PyTorch binary (.bin) files. These files are typically large (hundreds of megabytes to tens of gigabytes) and require Git LFS (Large File Storage) for proper download.
  • Tokenizer files: The vocabulary and encoding rules used to convert text into token sequences. Commonly stored as tokenizer.json, tokenizer_config.json, vocab.txt, or SentencePiece .model files.
  • Model configuration: A config.json file specifying the model architecture (hidden size, number of layers, number of attention heads, vocabulary size, etc.).

Supported Model Families

The MNN export pipeline supports a broad range of transformer-based LLM architectures. The model mapper in MNN (transformers/llm/export/utils/model_mapper.py) registers architecture-specific mappings for the following families:

  • Qwen family: Qwen, Qwen2, Qwen3, Qwen3-MoE, Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3-VL-MoE, Qwen2-Audio, Qwen2.5-Omni
  • Llama family: Llama, Llama-4-Text (and derivatives such as InternLM, MobileLLM)
  • Baichuan: Baichuan (with fused QKV projection via W_pack)
  • DeepSeek: DeepSeek-VL
  • ChatGLM family: ChatGLM (original), ChatGLM2/3/4
  • Phi family: Phi-MSFT, Phi-2/3
  • Gemma family: Gemma2, Gemma3, Gemma3-Text
  • Others: OpenELM, MiniCPM, MiniCPM-V, InternVL, Idefics3, SmolVLM, FastVLM, Hunyuan, MIMO, FunAudioChat, GPT-OSS

Any model whose config.json declares a model_type not explicitly registered will fall back to the default Llama-style mapping via AutoModelForCausalLM.

Acquisition Workflow

The acquisition process follows these steps:

  1. Install Git LFS: Large model weight files are tracked by Git LFS. Without it, cloning will produce placeholder pointer files instead of actual weights.
  2. Clone the model repository: Use git clone to download the full repository from HuggingFace or ModelScope.
  3. Verify file integrity: After cloning, verify that weight files have realistic sizes (not a few hundred bytes, which would indicate LFS pointers were not resolved).
  4. Identify model type: Check config.json for the model_type field to confirm it is supported by MNN's export pipeline.

Key Considerations

  • Storage requirements: Full-precision LLM weights can be very large. A 7B parameter model at float16 requires approximately 14 GB of storage. Ensure sufficient disk space before cloning.
  • Network bandwidth: Downloading large models requires stable, high-bandwidth connections. Consider using --depth 1 for shallow clones if full git history is not needed.
  • Model provenance: Both HuggingFace and ModelScope host the same model families. ModelScope (modelscope.cn) may offer faster downloads for users in China.
  • Version compatibility: The MNN export tool relies on HuggingFace Transformers library for model loading. Ensure the model's architecture is compatible with the installed version of transformers.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment