Implementation:Mlc ai Mlc llm Model Preset
| Knowledge Sources | |
|---|---|
| Domains | Model_Architecture, LLM |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Provides a comprehensive dictionary of builtin model configuration presets for all supported LLM architectures in MLC LLM, serving as the reference configuration database for model compilation and testing.
Description
This module defines the MODEL_PRESETS dictionary, a large registry mapping preset names to their complete HuggingFace-compatible configuration dictionaries. Each preset entry contains all the hyperparameters needed to instantiate the corresponding model architecture, including hidden sizes, layer counts, attention head configurations, vocabulary sizes, RoPE parameters, and MLC-specific settings like context_window_size, prefill_chunk_size, and sliding_window_size.
The presets cover the following model families and variants:
Llama Family:
llama2_7b,llama2_13b,llama2_70b-- Llama 2 modelscodellama_7b,codellama_13b,codellama_34b-- CodeLlama modelstinyllama_1b_chat_v0.4,tinyllama_1b_chat_v1.0-- TinyLlama modelsllama3_1_8b,llama3_1_70b-- Llama 3.1 with llama3 RoPE scalingllama3_2_1b,llama3_2_3b-- Llama 3.2 with tied word embeddingssmollm_135m,smollm_360m,smollm_1_7b-- SmolLM modelssmollm2_135m,smollm2_360m-- SmolLM2 models
Mistral Family:
mistral_7b-- Mistral 7B v0.1 with sliding window (4096)mistral_7b_v03-- Mistral 7B v0.3 without sliding windowMixtral-8x7B-v0.1-- Mixtral MoE modelministral3_3b_reasoning_2512-- Ministral 3 with YaRN RoPE and vision config
Google Models:
gemma_2b-- Gemma 2Bgemma2_2b,gemma2_2b-jpn,gemma2_9b,gemma2_27b-- Gemma 2 variantsgemma3_1b_it-- Gemma 3 1B instruction-tuned
Qwen Family:
qwen-- Qwen 1.0qwen2,qwen2_0_5b,qwen2_1_5b,qwen2.5_3b,qwen2_7b-- Qwen 2 variantsqwen2moe-- Qwen2 MoEqwen3_0.6b,qwen3_1.7b-- Qwen 3 variants
Microsoft Phi Family:
phi-1_5,phi-2-- Phi 1.5 and Phi 2phi-3_5-- Phi 3.5 with LongRoPE scalingphi-3_5-vision-- Phi 3.5 Vision with SU RoPE scalingphi-4-- Phi 4 with LongRoPE scaling and tied embeddings
Other Architectures:
gpt2,gpt2_medium-- GPT-2 modelsgpt_bigcode-- GPTBigCode (StarCoder)redpajama_3b_v1-- RedPajama (GPT-NeoX)stablelm,stablelm-2-zephyr-1_6b-- StableLM modelsbaichuan-- Baichuaninternlm-- InternLM 1.0internlm2,internlm2_5_7b-- InternLM 2 and 2.5rwkv5_3b-- RWKV5orion-- Orionllava-- LLaVA multimodalchatglm-- ChatGLMsnowflake-arctic-embed-m-- Snowflake Arctic embedding (BERT)starcoder2-- StarCoder2aya-23-- Aya-23 (Cohere)minicpm_2b,minicpm_2b_sft_bf16,minicpm-moe-8x2b-- MiniCPM modelsdeepseek-- DeepSeekdeepseek_v2_lite-- DeepSeek V2 Lite with MLAgpt_j-- GPT-J
Usage
Use this module to look up default configurations for supported models, as a reference for expected configuration fields, or for automated testing of model compilation across all supported architectures. The presets are used internally by the MLC LLM compilation pipeline when a model does not provide a complete configuration.
Code Reference
Source Location
- Repository: Mlc_ai_Mlc_llm
- File: python/mlc_llm/model/model_preset.py
Signature
MODEL_PRESETS: Dict[str, Any] = {
"llama2_7b": { ... },
"llama2_13b": { ... },
"mistral_7b": { ... },
"gemma3_1b_it": { ... },
# ... 50+ model presets
}
Import
from mlc_llm.model.model_preset import MODEL_PRESETS
I/O Contract
Dictionary Structure
Each preset entry in MODEL_PRESETS is a Dict[str, Any] with HuggingFace-compatible keys:
| Key | Type | Description |
|---|---|---|
architectures |
List[str] | HuggingFace architecture class names (e.g., ["LlamaForCausalLM"]) |
model_type |
str | Model type identifier (e.g., "llama", "mistral", "gemma3_text") |
hidden_size |
int | Hidden dimension of the transformer |
intermediate_size |
int | MLP intermediate dimension |
num_attention_heads |
int | Number of query attention heads |
num_hidden_layers |
int | Number of transformer decoder layers |
num_key_value_heads |
int | Number of KV heads (for GQA) |
vocab_size |
int | Vocabulary size |
rms_norm_eps |
float | RMSNorm epsilon (for RMSNorm models) |
rope_theta |
float | RoPE base frequency |
rope_scaling |
Optional[Dict] | RoPE scaling configuration (llama3, longrope, yarn, etc.) |
context_window_size |
int | MLC-specific: maximum context window size |
prefill_chunk_size |
int | MLC-specific: prefill chunk size |
sliding_window_size |
int | MLC-specific: sliding window attention size |
tie_word_embeddings |
bool | Whether to share embedding and LM head weights |
Architecture-Specific Fields
| Field | Used By | Description |
|---|---|---|
n_embd, n_head, n_layer, n_positions |
GPT-2, GPTBigCode, Phi | GPT-2 style naming convention |
head_dim |
Gemma, Gemma2, Gemma3, Llama 3.2, Qwen3 | Explicit head dimension (not derived from hidden_size/num_heads) |
sliding_window |
Mistral, Gemma2, Starcoder2 | HuggingFace sliding window field |
text_config |
LLaVA, Ministral3 | Nested text model configuration for multimodal models |
vision_config |
LLaVA, Phi-3.5-Vision, Ministral3 | Vision encoder configuration |
rope_parameters |
Ministral3 | YaRN RoPE parameters (factor, mscale, rope_theta, etc.) |
query_pre_attn_scalar |
Gemma2, Gemma3 | Custom attention scaling factor |
Usage Examples
from mlc_llm.model.model_preset import MODEL_PRESETS
# Look up a model preset
llama_config = MODEL_PRESETS["llama2_7b"]
print(llama_config["hidden_size"]) # 4096
print(llama_config["num_hidden_layers"]) # 32
# List all available presets
print(list(MODEL_PRESETS.keys()))
# Access a Mistral preset with sliding window
mistral_config = MODEL_PRESETS["mistral_7b"]
print(mistral_config["sliding_window_size"]) # 4096
print(mistral_config["attention_sink_size"]) # 4
Related Pages
- Implementation:Mlc_ai_Mlc_llm_Llama_Model
- Implementation:Mlc_ai_Mlc_llm_Mistral_Model
- Implementation:Mlc_ai_Mlc_llm_Gemma3_Model
- Implementation:Mlc_ai_Mlc_llm_GPT_BigCode_Model
- Implementation:Mlc_ai_Mlc_llm_InternLM2_Model
- Implementation:Mlc_ai_Mlc_llm_Ministral3_Model
- Implementation:Mlc_ai_Mlc_llm_OLMo_Model