Implementation:Volcengine Verl Chat Message Template
| Field | Value |
|---|---|
| Knowledge Sources | verl source code, data preprocessing examples |
| Domains | Prompt Engineering, Data Preprocessing, Chat Format |
| Last Updated | 2026-02-07 |
Overview
Description
This pattern documents the construction of OpenAI chat-format message lists used throughout verl's data preprocessing pipeline. Every training example must include a prompt field containing a list of message dictionaries with role and content keys. This format is required by the processor.apply_chat_template() method that converts messages into model-specific token sequences.
The pattern has two primary variants:
- Single-turn -- A single
"user"message containing the question plus an instruction-following suffix (e.g.,'Let\'s think step by step and output the final answer after "####".'). This is the standard format for math benchmarks like GSM8K and MATH.
- Multi-turn with system message -- A
"system"message defining tool availability and behavior, followed by a"user"message. The system message typically describes available tools in a structured format. This variant is used for multi-turn tool-calling scenarios.
The message list is stored in the "prompt" column of the parquet dataset and consumed by RLHFDataset at training time.
Usage
This pattern is applied during data preprocessing (before training begins). Each dataset-specific preprocessing script constructs the appropriate message list and stores it alongside reward model configuration and extra information.
Code Reference
| Field | Value |
|---|---|
| Source Location | examples/data_preprocess/gsm8k.py, Lines 57-84
|
| Pattern Type | Pure Python dict construction (no special import needed) |
| Consumer | RLHFDataset.__getitem__ reads the "prompt" column and passes it to processor.apply_chat_template()
|
I/O Contract
Inputs
| Field | Type | Description |
|---|---|---|
question |
str |
The raw question text from the dataset. |
instruction_following |
str |
A suffix appended to the question that guides the model's output format. |
Outputs
| Field | Type | Description |
|---|---|---|
prompt |
list[dict[str, str]] |
A list of message dicts, each with "role" and "content" keys. Compatible with OpenAI's chat completion API format.
|
Usage Examples
GSM8K single-turn format (math problem):
# From examples/data_preprocess/gsm8k.py, Lines 57-84
instruction_following = 'Let\'s think step by step and output the final answer after "####".'
def make_map_fn(split):
def process_fn(example, idx):
question_raw = example.pop("question")
question = question_raw + " " + instruction_following
answer_raw = example.pop("answer")
solution = extract_solution(answer_raw)
data = {
"data_source": "openai/gsm8k",
"prompt": [
{
"role": "user",
"content": question,
}
],
"ability": "math",
"reward_model": {"style": "rule", "ground_truth": solution},
"extra_info": {
"split": split,
"index": idx,
"answer": answer_raw,
"question": question_raw,
},
}
return data
return process_fn
Multi-turn format with system message for tools:
# Multi-turn prompt with system message describing available tools
data = {
"data_source": "openai/gsm8k",
"prompt": [
{
"role": "system",
"content": (
"You are a helpful assistant with access to a calculator tool. "
"When you need to perform calculations, use the calculate function."
),
},
{
"role": "user",
"content": "What is 123 * 456? Use the calculator to solve this.",
},
],
"ability": "math",
"reward_model": {"style": "rule", "ground_truth": "56088"},
}
MATH dataset format (with boxed answer):
# From examples/data_preprocess/math_dataset.py
instruction_following = "Let's think step by step and output the final answer within \\boxed{}."
data = {
"data_source": "DigitalLearningGmbH/MATH-lighteval",
"prompt": [{"role": "user", "content": question + " " + instruction_following}],
"ability": "math",
"reward_model": {"style": "rule", "ground_truth": solution},
"extra_info": {"split": split, "index": idx},
}