Principle:Mit han lab Llm awq Prompt Template Configuration

Overview

System for mapping model architectures to their expected prompt formats to ensure correct instruction-following behavior during interactive chat.

Description

Different LLM families (LLaMA-2, LLaMA-3, Vicuna, Falcon, MPT, Qwen) expect different prompt formats with specific system messages, delimiters, and role tags. Using the wrong template causes degraded responses or nonsensical output. A prompt template factory detects the model variant from the model type and path, then returns the appropriate formatter that wraps user inputs in the correct template structure.

For example, LLaMA-2 chat models expect prompts wrapped in [INST] and [/INST] delimiters with a <<SYS>> block for the system message, while Vicuna models use a simple USER: / ASSISTANT: format. LLaMA-3 introduces <|start_header_id|> and <|end_header_id|> tags. Each family also defines different stop tokens that signal the end of a generated response.

The prompt template factory pattern centralizes this logic so that the rest of the chat pipeline (tokenization, generation, streaming) remains model-agnostic. The factory inspects the model_type and model_path strings to determine which prompter subclass to instantiate.

Usage

When deploying any LLM for interactive chat. Must be configured before the generation loop. The prompt template is selected once during initialization and then used for every turn in the conversation:

Detect the model variant from model_type and model_path
Instantiate the corresponding prompter subclass
Use insert_prompt() to format each user input before tokenization
Use update_template() for multi-turn conversation state management

Related Pages

Implementation:Mit_han_lab_Llm_awq_Get_prompter

Knowledge Sources

Repo|llm-awq|https://github.com/mit-han-lab/llm-awq

Domains

NLP
Deployment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment