Implementation:Mlc ai Mlc llm Model Preset

Knowledge Sources	Mlc_ai_Mlc_llm
Domains	Model_Architecture, LLM
Last Updated	2026-02-09 19:00 GMT

Overview

Provides a comprehensive dictionary of builtin model configuration presets for all supported LLM architectures in MLC LLM, serving as the reference configuration database for model compilation and testing.

Description

This module defines the MODEL_PRESETS dictionary, a large registry mapping preset names to their complete HuggingFace-compatible configuration dictionaries. Each preset entry contains all the hyperparameters needed to instantiate the corresponding model architecture, including hidden sizes, layer counts, attention head configurations, vocabulary sizes, RoPE parameters, and MLC-specific settings like context_window_size, prefill_chunk_size, and sliding_window_size.

The presets cover the following model families and variants:

Llama Family:

llama2_7b, llama2_13b, llama2_70b -- Llama 2 models
codellama_7b, codellama_13b, codellama_34b -- CodeLlama models
tinyllama_1b_chat_v0.4, tinyllama_1b_chat_v1.0 -- TinyLlama models
llama3_1_8b, llama3_1_70b -- Llama 3.1 with llama3 RoPE scaling
llama3_2_1b, llama3_2_3b -- Llama 3.2 with tied word embeddings
smollm_135m, smollm_360m, smollm_1_7b -- SmolLM models
smollm2_135m, smollm2_360m -- SmolLM2 models

Mistral Family:

mistral_7b -- Mistral 7B v0.1 with sliding window (4096)
mistral_7b_v03 -- Mistral 7B v0.3 without sliding window
Mixtral-8x7B-v0.1 -- Mixtral MoE model
ministral3_3b_reasoning_2512 -- Ministral 3 with YaRN RoPE and vision config

Google Models:

gemma_2b -- Gemma 2B
gemma2_2b, gemma2_2b-jpn, gemma2_9b, gemma2_27b -- Gemma 2 variants
gemma3_1b_it -- Gemma 3 1B instruction-tuned

Qwen Family:

qwen -- Qwen 1.0
qwen2, qwen2_0_5b, qwen2_1_5b, qwen2.5_3b, qwen2_7b -- Qwen 2 variants
qwen2moe -- Qwen2 MoE
qwen3_0.6b, qwen3_1.7b -- Qwen 3 variants

Microsoft Phi Family:

phi-1_5, phi-2 -- Phi 1.5 and Phi 2
phi-3_5 -- Phi 3.5 with LongRoPE scaling
phi-3_5-vision -- Phi 3.5 Vision with SU RoPE scaling
phi-4 -- Phi 4 with LongRoPE scaling and tied embeddings

Other Architectures:

gpt2, gpt2_medium -- GPT-2 models
gpt_bigcode -- GPTBigCode (StarCoder)
redpajama_3b_v1 -- RedPajama (GPT-NeoX)
stablelm, stablelm-2-zephyr-1_6b -- StableLM models
baichuan -- Baichuan
internlm -- InternLM 1.0
internlm2, internlm2_5_7b -- InternLM 2 and 2.5
rwkv5_3b -- RWKV5
orion -- Orion
llava -- LLaVA multimodal
chatglm -- ChatGLM
snowflake-arctic-embed-m -- Snowflake Arctic embedding (BERT)
starcoder2 -- StarCoder2
aya-23 -- Aya-23 (Cohere)
minicpm_2b, minicpm_2b_sft_bf16, minicpm-moe-8x2b -- MiniCPM models
deepseek -- DeepSeek
deepseek_v2_lite -- DeepSeek V2 Lite with MLA
gpt_j -- GPT-J

Usage

Use this module to look up default configurations for supported models, as a reference for expected configuration fields, or for automated testing of model compilation across all supported architectures. The presets are used internally by the MLC LLM compilation pipeline when a model does not provide a complete configuration.

Code Reference

Source Location

Repository: Mlc_ai_Mlc_llm
File: python/mlc_llm/model/model_preset.py

Signature

MODEL_PRESETS: Dict[str, Any] = {
    "llama2_7b": { ... },
    "llama2_13b": { ... },
    "mistral_7b": { ... },
    "gemma3_1b_it": { ... },
    # ... 50+ model presets
}

Import

from mlc_llm.model.model_preset import MODEL_PRESETS

I/O Contract

Dictionary Structure

Each preset entry in MODEL_PRESETS is a Dict[str, Any] with HuggingFace-compatible keys:

Key	Type	Description
`architectures`	List[str]	HuggingFace architecture class names (e.g., ["LlamaForCausalLM"])
`model_type`	str	Model type identifier (e.g., "llama", "mistral", "gemma3_text")
`hidden_size`	int	Hidden dimension of the transformer
`intermediate_size`	int	MLP intermediate dimension
`num_attention_heads`	int	Number of query attention heads
`num_hidden_layers`	int	Number of transformer decoder layers
`num_key_value_heads`	int	Number of KV heads (for GQA)
`vocab_size`	int	Vocabulary size
`rms_norm_eps`	float	RMSNorm epsilon (for RMSNorm models)
`rope_theta`	float	RoPE base frequency
`rope_scaling`	Optional[Dict]	RoPE scaling configuration (llama3, longrope, yarn, etc.)
`context_window_size`	int	MLC-specific: maximum context window size
`prefill_chunk_size`	int	MLC-specific: prefill chunk size
`sliding_window_size`	int	MLC-specific: sliding window attention size
`tie_word_embeddings`	bool	Whether to share embedding and LM head weights

Architecture-Specific Fields

Field	Used By	Description
`n_embd`, `n_head`, `n_layer`, `n_positions`	GPT-2, GPTBigCode, Phi	GPT-2 style naming convention
`head_dim`	Gemma, Gemma2, Gemma3, Llama 3.2, Qwen3	Explicit head dimension (not derived from hidden_size/num_heads)
`sliding_window`	Mistral, Gemma2, Starcoder2	HuggingFace sliding window field
`text_config`	LLaVA, Ministral3	Nested text model configuration for multimodal models
`vision_config`	LLaVA, Phi-3.5-Vision, Ministral3	Vision encoder configuration
`rope_parameters`	Ministral3	YaRN RoPE parameters (factor, mscale, rope_theta, etc.)
`query_pre_attn_scalar`	Gemma2, Gemma3	Custom attention scaling factor

Usage Examples

from mlc_llm.model.model_preset import MODEL_PRESETS

# Look up a model preset
llama_config = MODEL_PRESETS["llama2_7b"]
print(llama_config["hidden_size"])  # 4096
print(llama_config["num_hidden_layers"])  # 32

# List all available presets
print(list(MODEL_PRESETS.keys()))

# Access a Mistral preset with sliding window
mistral_config = MODEL_PRESETS["mistral_7b"]
print(mistral_config["sliding_window_size"])  # 4096
print(mistral_config["attention_sink_size"])  # 4

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment