Implementation:Volcengine Verl Chat Message Template

Field	Value
Knowledge Sources	verl source code, data preprocessing examples
Domains	Prompt Engineering, Data Preprocessing, Chat Format
Last Updated	2026-02-07

Overview

Description

This pattern documents the construction of OpenAI chat-format message lists used throughout verl's data preprocessing pipeline. Every training example must include a prompt field containing a list of message dictionaries with role and content keys. This format is required by the processor.apply_chat_template() method that converts messages into model-specific token sequences.

The pattern has two primary variants:

Single-turn -- A single "user" message containing the question plus an instruction-following suffix (e.g., 'Let\'s think step by step and output the final answer after "####".'). This is the standard format for math benchmarks like GSM8K and MATH.

Multi-turn with system message -- A "system" message defining tool availability and behavior, followed by a "user" message. The system message typically describes available tools in a structured format. This variant is used for multi-turn tool-calling scenarios.

The message list is stored in the "prompt" column of the parquet dataset and consumed by RLHFDataset at training time.

Usage

This pattern is applied during data preprocessing (before training begins). Each dataset-specific preprocessing script constructs the appropriate message list and stores it alongside reward model configuration and extra information.

Code Reference

Field	Value
Source Location	`examples/data_preprocess/gsm8k.py`, Lines 57-84
Pattern Type	Pure Python dict construction (no special import needed)
Consumer	`RLHFDataset.__getitem__` reads the `"prompt"` column and passes it to `processor.apply_chat_template()`

I/O Contract

Inputs

Field	Type	Description
`question`	`str`	The raw question text from the dataset.
`instruction_following`	`str`	A suffix appended to the question that guides the model's output format.

Outputs

Field	Type	Description
`prompt`	`list[dict[str, str]]`	A list of message dicts, each with `"role"` and `"content"` keys. Compatible with OpenAI's chat completion API format.

Usage Examples

GSM8K single-turn format (math problem):

# From examples/data_preprocess/gsm8k.py, Lines 57-84

instruction_following = 'Let\'s think step by step and output the final answer after "####".'

def make_map_fn(split):
    def process_fn(example, idx):
        question_raw = example.pop("question")
        question = question_raw + " " + instruction_following
        answer_raw = example.pop("answer")
        solution = extract_solution(answer_raw)

        data = {
            "data_source": "openai/gsm8k",
            "prompt": [
                {
                    "role": "user",
                    "content": question,
                }
            ],
            "ability": "math",
            "reward_model": {"style": "rule", "ground_truth": solution},
            "extra_info": {
                "split": split,
                "index": idx,
                "answer": answer_raw,
                "question": question_raw,
            },
        }
        return data

    return process_fn

Multi-turn format with system message for tools:

# Multi-turn prompt with system message describing available tools

data = {
    "data_source": "openai/gsm8k",
    "prompt": [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant with access to a calculator tool. "
                "When you need to perform calculations, use the calculate function."
            ),
        },
        {
            "role": "user",
            "content": "What is 123 * 456? Use the calculator to solve this.",
        },
    ],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": "56088"},
}

MATH dataset format (with boxed answer):

# From examples/data_preprocess/math_dataset.py

instruction_following = "Let's think step by step and output the final answer within \\boxed{}."

data = {
    "data_source": "DigitalLearningGmbH/MATH-lighteval",
    "prompt": [{"role": "user", "content": question + " " + instruction_following}],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": solution},
    "extra_info": {"split": split, "index": idx},
}

Related Pages

Principle:Volcengine_Verl_Prompt_Template_Design

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment