Principle:Sail sg LongSpec Prompt Formatting
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation, Prompt_Engineering |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Principle for constructing task-appropriate prompts from raw benchmark data using template-based formatting and model-specific chat templates.
Description
Prompt Formatting bridges raw evaluation data and model input by applying task-specific templates. Two distinct formatting patterns are used:
- LongBench tasks: Simple string templates with {context} and {input} placeholders. Each task (gov_report, qmsum, multi_news, lcc, repobench-p) has a dedicated prompt template that provides task instructions and formats the source material.
- AIME/QwQ tasks: Qwen2 chat template format with system, user, and assistant roles using special tokens (<|im_start|>, <|im_end|>). The math problem is wrapped in a conversational structure that triggers chain-of-thought reasoning.
After formatting, prompts are tokenized using the target model's tokenizer and transferred to CUDA for inference.
Usage
Apply when preparing prompts for evaluation. The prompt format must match the target model's training format—Llama-based models use plain text templates while Qwen2-based models (QwQ) require the chat template format.
Theoretical Basis
Prompt formatting follows the template interpolation pattern where task context is inserted into a fixed instruction frame:
# Abstract pattern (not actual implementation)
formatted = template.format(context=raw_data["context"], input=raw_data["input"])
input_ids = tokenizer(formatted, return_tensors="pt").input_ids.cuda()
prompt_length = input_ids.shape[1] # Generation starts after this position