Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Volcengine Verl Chat Message Template

From Leeroopedia


Field Value
Knowledge Sources verl source code, data preprocessing examples
Domains Prompt Engineering, Data Preprocessing, Chat Format
Last Updated 2026-02-07

Overview

Description

This pattern documents the construction of OpenAI chat-format message lists used throughout verl's data preprocessing pipeline. Every training example must include a prompt field containing a list of message dictionaries with role and content keys. This format is required by the processor.apply_chat_template() method that converts messages into model-specific token sequences.

The pattern has two primary variants:

  • Single-turn -- A single "user" message containing the question plus an instruction-following suffix (e.g., 'Let\'s think step by step and output the final answer after "####".'). This is the standard format for math benchmarks like GSM8K and MATH.
  • Multi-turn with system message -- A "system" message defining tool availability and behavior, followed by a "user" message. The system message typically describes available tools in a structured format. This variant is used for multi-turn tool-calling scenarios.

The message list is stored in the "prompt" column of the parquet dataset and consumed by RLHFDataset at training time.

Usage

This pattern is applied during data preprocessing (before training begins). Each dataset-specific preprocessing script constructs the appropriate message list and stores it alongside reward model configuration and extra information.

Code Reference

Field Value
Source Location examples/data_preprocess/gsm8k.py, Lines 57-84
Pattern Type Pure Python dict construction (no special import needed)
Consumer RLHFDataset.__getitem__ reads the "prompt" column and passes it to processor.apply_chat_template()

I/O Contract

Inputs

Field Type Description
question str The raw question text from the dataset.
instruction_following str A suffix appended to the question that guides the model's output format.

Outputs

Field Type Description
prompt list[dict[str, str]] A list of message dicts, each with "role" and "content" keys. Compatible with OpenAI's chat completion API format.

Usage Examples

GSM8K single-turn format (math problem):

# From examples/data_preprocess/gsm8k.py, Lines 57-84

instruction_following = 'Let\'s think step by step and output the final answer after "####".'

def make_map_fn(split):
    def process_fn(example, idx):
        question_raw = example.pop("question")
        question = question_raw + " " + instruction_following
        answer_raw = example.pop("answer")
        solution = extract_solution(answer_raw)

        data = {
            "data_source": "openai/gsm8k",
            "prompt": [
                {
                    "role": "user",
                    "content": question,
                }
            ],
            "ability": "math",
            "reward_model": {"style": "rule", "ground_truth": solution},
            "extra_info": {
                "split": split,
                "index": idx,
                "answer": answer_raw,
                "question": question_raw,
            },
        }
        return data

    return process_fn

Multi-turn format with system message for tools:

# Multi-turn prompt with system message describing available tools

data = {
    "data_source": "openai/gsm8k",
    "prompt": [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant with access to a calculator tool. "
                "When you need to perform calculations, use the calculate function."
            ),
        },
        {
            "role": "user",
            "content": "What is 123 * 456? Use the calculator to solve this.",
        },
    ],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": "56088"},
}

MATH dataset format (with boxed answer):

# From examples/data_preprocess/math_dataset.py

instruction_following = "Let's think step by step and output the final answer within \\boxed{}."

data = {
    "data_source": "DigitalLearningGmbH/MATH-lighteval",
    "prompt": [{"role": "user", "content": question + " " + instruction_following}],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": solution},
    "extra_info": {"split": split, "index": idx},
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment