Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mit han lab Llm awq Prompt Template Configuration

From Leeroopedia

Overview

System for mapping model architectures to their expected prompt formats to ensure correct instruction-following behavior during interactive chat.

Description

Different LLM families (LLaMA-2, LLaMA-3, Vicuna, Falcon, MPT, Qwen) expect different prompt formats with specific system messages, delimiters, and role tags. Using the wrong template causes degraded responses or nonsensical output. A prompt template factory detects the model variant from the model type and path, then returns the appropriate formatter that wraps user inputs in the correct template structure.

For example, LLaMA-2 chat models expect prompts wrapped in [INST] and [/INST] delimiters with a <<SYS>> block for the system message, while Vicuna models use a simple USER: / ASSISTANT: format. LLaMA-3 introduces <|start_header_id|> and <|end_header_id|> tags. Each family also defines different stop tokens that signal the end of a generated response.

The prompt template factory pattern centralizes this logic so that the rest of the chat pipeline (tokenization, generation, streaming) remains model-agnostic. The factory inspects the model_type and model_path strings to determine which prompter subclass to instantiate.

Usage

When deploying any LLM for interactive chat. Must be configured before the generation loop. The prompt template is selected once during initialization and then used for every turn in the conversation:

  • Detect the model variant from model_type and model_path
  • Instantiate the corresponding prompter subclass
  • Use insert_prompt() to format each user input before tokenization
  • Use update_template() for multi-turn conversation state management

Related Pages

Knowledge Sources

Domains

  • NLP
  • Deployment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment