Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Model Preset

From Leeroopedia


Knowledge Sources
Domains Model_Architecture, LLM
Last Updated 2026-02-09 19:00 GMT

Overview

Provides a comprehensive dictionary of builtin model configuration presets for all supported LLM architectures in MLC LLM, serving as the reference configuration database for model compilation and testing.

Description

This module defines the MODEL_PRESETS dictionary, a large registry mapping preset names to their complete HuggingFace-compatible configuration dictionaries. Each preset entry contains all the hyperparameters needed to instantiate the corresponding model architecture, including hidden sizes, layer counts, attention head configurations, vocabulary sizes, RoPE parameters, and MLC-specific settings like context_window_size, prefill_chunk_size, and sliding_window_size.

The presets cover the following model families and variants:

Llama Family:

  • llama2_7b, llama2_13b, llama2_70b -- Llama 2 models
  • codellama_7b, codellama_13b, codellama_34b -- CodeLlama models
  • tinyllama_1b_chat_v0.4, tinyllama_1b_chat_v1.0 -- TinyLlama models
  • llama3_1_8b, llama3_1_70b -- Llama 3.1 with llama3 RoPE scaling
  • llama3_2_1b, llama3_2_3b -- Llama 3.2 with tied word embeddings
  • smollm_135m, smollm_360m, smollm_1_7b -- SmolLM models
  • smollm2_135m, smollm2_360m -- SmolLM2 models

Mistral Family:

  • mistral_7b -- Mistral 7B v0.1 with sliding window (4096)
  • mistral_7b_v03 -- Mistral 7B v0.3 without sliding window
  • Mixtral-8x7B-v0.1 -- Mixtral MoE model
  • ministral3_3b_reasoning_2512 -- Ministral 3 with YaRN RoPE and vision config

Google Models:

  • gemma_2b -- Gemma 2B
  • gemma2_2b, gemma2_2b-jpn, gemma2_9b, gemma2_27b -- Gemma 2 variants
  • gemma3_1b_it -- Gemma 3 1B instruction-tuned

Qwen Family:

  • qwen -- Qwen 1.0
  • qwen2, qwen2_0_5b, qwen2_1_5b, qwen2.5_3b, qwen2_7b -- Qwen 2 variants
  • qwen2moe -- Qwen2 MoE
  • qwen3_0.6b, qwen3_1.7b -- Qwen 3 variants

Microsoft Phi Family:

  • phi-1_5, phi-2 -- Phi 1.5 and Phi 2
  • phi-3_5 -- Phi 3.5 with LongRoPE scaling
  • phi-3_5-vision -- Phi 3.5 Vision with SU RoPE scaling
  • phi-4 -- Phi 4 with LongRoPE scaling and tied embeddings

Other Architectures:

  • gpt2, gpt2_medium -- GPT-2 models
  • gpt_bigcode -- GPTBigCode (StarCoder)
  • redpajama_3b_v1 -- RedPajama (GPT-NeoX)
  • stablelm, stablelm-2-zephyr-1_6b -- StableLM models
  • baichuan -- Baichuan
  • internlm -- InternLM 1.0
  • internlm2, internlm2_5_7b -- InternLM 2 and 2.5
  • rwkv5_3b -- RWKV5
  • orion -- Orion
  • llava -- LLaVA multimodal
  • chatglm -- ChatGLM
  • snowflake-arctic-embed-m -- Snowflake Arctic embedding (BERT)
  • starcoder2 -- StarCoder2
  • aya-23 -- Aya-23 (Cohere)
  • minicpm_2b, minicpm_2b_sft_bf16, minicpm-moe-8x2b -- MiniCPM models
  • deepseek -- DeepSeek
  • deepseek_v2_lite -- DeepSeek V2 Lite with MLA
  • gpt_j -- GPT-J

Usage

Use this module to look up default configurations for supported models, as a reference for expected configuration fields, or for automated testing of model compilation across all supported architectures. The presets are used internally by the MLC LLM compilation pipeline when a model does not provide a complete configuration.

Code Reference

Source Location

Signature

MODEL_PRESETS: Dict[str, Any] = {
    "llama2_7b": { ... },
    "llama2_13b": { ... },
    "mistral_7b": { ... },
    "gemma3_1b_it": { ... },
    # ... 50+ model presets
}

Import

from mlc_llm.model.model_preset import MODEL_PRESETS

I/O Contract

Dictionary Structure

Each preset entry in MODEL_PRESETS is a Dict[str, Any] with HuggingFace-compatible keys:

Key Type Description
architectures List[str] HuggingFace architecture class names (e.g., ["LlamaForCausalLM"])
model_type str Model type identifier (e.g., "llama", "mistral", "gemma3_text")
hidden_size int Hidden dimension of the transformer
intermediate_size int MLP intermediate dimension
num_attention_heads int Number of query attention heads
num_hidden_layers int Number of transformer decoder layers
num_key_value_heads int Number of KV heads (for GQA)
vocab_size int Vocabulary size
rms_norm_eps float RMSNorm epsilon (for RMSNorm models)
rope_theta float RoPE base frequency
rope_scaling Optional[Dict] RoPE scaling configuration (llama3, longrope, yarn, etc.)
context_window_size int MLC-specific: maximum context window size
prefill_chunk_size int MLC-specific: prefill chunk size
sliding_window_size int MLC-specific: sliding window attention size
tie_word_embeddings bool Whether to share embedding and LM head weights

Architecture-Specific Fields

Field Used By Description
n_embd, n_head, n_layer, n_positions GPT-2, GPTBigCode, Phi GPT-2 style naming convention
head_dim Gemma, Gemma2, Gemma3, Llama 3.2, Qwen3 Explicit head dimension (not derived from hidden_size/num_heads)
sliding_window Mistral, Gemma2, Starcoder2 HuggingFace sliding window field
text_config LLaVA, Ministral3 Nested text model configuration for multimodal models
vision_config LLaVA, Phi-3.5-Vision, Ministral3 Vision encoder configuration
rope_parameters Ministral3 YaRN RoPE parameters (factor, mscale, rope_theta, etc.)
query_pre_attn_scalar Gemma2, Gemma3 Custom attention scaling factor

Usage Examples

from mlc_llm.model.model_preset import MODEL_PRESETS

# Look up a model preset
llama_config = MODEL_PRESETS["llama2_7b"]
print(llama_config["hidden_size"])  # 4096
print(llama_config["num_hidden_layers"])  # 32

# List all available presets
print(list(MODEL_PRESETS.keys()))

# Access a Mistral preset with sliding window
mistral_config = MODEL_PRESETS["mistral_7b"]
print(mistral_config["sliding_window_size"])  # 4096
print(mistral_config["attention_sink_size"])  # 4

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment