Implementation:Sail sg LongSpec Dataset Prompt Templates

Knowledge Sources	LongSpec
Domains	Evaluation, Prompt_Engineering
Last Updated	2026-02-14 05:00 GMT

Overview

Concrete tool for formatting evaluation prompts using task-specific templates for LongBench tasks and Qwen2 chat templates for AIME mathematical reasoning.

Description

The dataset2prompt dictionary in inference_long-bench.py maps task names to prompt template strings. For QwQ inference, a Qwen2-specific chat template wraps math problems with system/user/assistant role markers.

This is a Pattern Doc — the templates are defined inline in evaluation scripts.

Usage

Used after data loading and before tokenization in evaluation scripts.

Code Reference

Source Location

Repository: LongSpec
File (LongBench): longspec/test/inference_long-bench.py
Lines: L8-39
File (QwQ): longspec/test/inference_qwq.py
Lines: L55-67

Signature

# Pattern Doc: Template dictionary and chat template formatting

dataset2prompt = {
    "gov_report": "You are given a report by a government agency. "
                  "Write a one-page summary of the report.\n\n"
                  "Report:\n{context}\n\nNow, write a one-page summary of the report.\n\nSummary:",
    "qmsum": "You are given a meeting transcript and a query containing a question or instruction. "
             "Answer the query in one or more sentences.\n\n"
             "Transcript:\n{context}\n\nQuery: {input}\n\nAnswer:",
    "multi_news": "You are given several news passages. Write a one-page summary of all news.\n\n"
                  "{context}\n\nNow, write a one-page summary of all the news.\n\nSummary:",
    "lcc": "Please complete the code given below.\n{context}",
    "repobench-p": "Please complete the code given below.\n{context}",
}

# QwQ chat template:
qwq_prompt = (
    "<|im_start|>system\nYou are a helpful and harmless assistant. "
    "You should think step-by-step.<|im_end|>\n"
    f"<|im_start|>user\n{{problem}}<|im_end|>\n"
    "<|im_start|>assistant\n"
)

Import

# No separate import — templates are defined inline in evaluation scripts

I/O Contract

Inputs

Name	Type	Required	Description
task	str	Yes	Task name selecting the template
context	str	Yes	Document/article text to include
input	str	No	Optional query/question (used by qmsum)
problem	str	Yes (AIME)	Math problem text for QwQ chat template

Outputs

Name	Type	Description
formatted_prompt	str	Complete prompt string ready for tokenization

Usage Examples

LongBench Prompt Formatting

task = "gov_report"
item = {"context": "The Federal Reserve reported...", "input": ""}

prompt = dataset2prompt[task].format(
    context=item["context"],
    input=item.get("input", ""),
)
# Result: "You are given a report by a government agency. Write a one-page summary..."

QwQ Chat Template

problem = "Find the sum of all positive integers n such that..."
prompt = (
    "<|im_start|>system\n"
    "You are a helpful and harmless assistant. You should think step-by-step.<|im_end|>\n"
    f"<|im_start|>user\n{problem}<|im_end|>\n"
    "<|im_start|>assistant\n"
)

Related Pages

Implements Principle

Principle:Sail_sg_LongSpec_Prompt_Formatting

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment