Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Princeton nlp SimPO Chat Template Application

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Preprocessing
Last Updated 2026-02-08 04:30 GMT

Overview

A text formatting step that converts structured message lists into model-specific prompt strings for preference optimization training.

Description

Chat template application transforms OpenAI-format message lists (with role and content keys) into the specific text format expected by each model's tokenizer. For SimPO training, this step must handle three distinct text segments: the prompt (all turns except the last), the chosen response (preferred final turn), and the rejected response (dispreferred final turn). A critical detail in SimPO's template application is BOS token stripping — the beginning-of-sequence token is removed from chosen and rejected responses to prevent double-BOS when the prompt already ends with one. The system also supports model-specific templates (e.g., Mistral's custom template) and optional empty system message insertion for models whose templates expect a system turn.

Usage

Use this principle immediately after dataset loading and before feeding data to the SimPO trainer. The chat template must match the target model's expected format. This is a necessary preprocessing step for any preference optimization pipeline.

Theoretical Basis

Chat templates follow the structural formatting principle for instruction-tuned models:

  1. Message decomposition — Split the conversation into prompt (N-1 turns) and response (final turn)
  2. Template application — Apply the model's Jinja2 chat template to convert messages to text
  3. BOS deduplication — Strip leading BOS tokens from response text to avoid double-BOS artifacts
  4. System message insertion — Optionally prepend an empty system message if the template expects one

Pseudo-code:

# Abstract algorithm (NOT real implementation)
prompt_messages = conversation[:-1]  # All turns except last
chosen_message = chosen[-1:]         # Last turn only
rejected_message = rejected[-1:]     # Last turn only

text_prompt = tokenizer.apply_chat_template(prompt_messages)
text_chosen = tokenizer.apply_chat_template(chosen_message)
text_rejected = tokenizer.apply_chat_template(rejected_message)

# Strip BOS from responses (SimPO-specific)
if text_chosen.startswith(BOS):
    text_chosen = text_chosen[len(BOS):]
if text_rejected.startswith(BOS):
    text_rejected = text_rejected[len(BOS):]

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment