Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Volcengine Verl Multi Turn Data Preprocessing

From Leeroopedia


Field Value
Knowledge Sources API Doc (verl data preprocessing)
Domains Data Preprocessing, Multi-Turn Conversation, Tool Use, Math Reasoning
Last Updated 2026-02-07

Overview

Description

This implementation preprocesses the GSM8K dataset into a multi-turn, tool-augmented Parquet format for reinforcement learning training with verl. Unlike the standard single-turn GSM8K preprocessing, this variant constructs a two-message prompt containing a system message (defining the model as a math expert that uses tools) and a user message (with the question). The extra_info field includes need_tools_kwargs=True along with tools_kwargs and interaction_kwargs dictionaries that configure a calc_gsm8k_reward tool for multi-turn interaction during rollout.

The extract_solution(solution_str) function is identical to the single-turn version, using regex r"#### (\-?[0-9\.\,]+)". The make_map_fn(split) closure differs by including a system prompt that instructs the model to reason step by step, use the reward calculation tool, and refine its answer before producing a final #### <answer> output.

Usage

Execute the script directly from the command line:

python examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_save_dir ~/data/gsm8k_multiturn

Code Reference

Attribute Detail
Source Location examples/data_preprocess/gsm8k_multiturn_w_tool.py, Lines 29-129
Signature (extract) def extract_solution(solution_str) -> str
Signature (map fn) def make_map_fn(split) -> Callable[[dict, int], dict]
Import Script executed directly: python examples/data_preprocess/gsm8k_multiturn_w_tool.py

I/O Contract

Inputs

Parameter Type Description
--local_save_dir str Directory where output Parquet files are saved (default: ~/data/gsm8k)
--local_dataset_path str (optional) Local path to a pre-downloaded GSM8K dataset
--hdfs_dir str (optional) HDFS directory for remote copy of output files
HuggingFace dataset openai/gsm8k Source dataset with question and answer columns

Outputs

Output Type Description
train.parquet Parquet file Training split with multi-turn schema
test.parquet Parquet file Test split with multi-turn schema

Output column schema:

Column Type Description
data_source str Always "openai/gsm8k"
prompt list[dict] Two-message prompt: system message (tool-use instructions) + user message (question)
ability str Always "math"
reward_model dict {"style": "rule", "ground_truth": solution}
extra_info dict Contains split, index, answer, question, need_tools_kwargs, tools_kwargs, interaction_kwargs

Usage Examples

Example 1: Multi-turn prompt structure

# The prompt field for multi-turn tool use:
prompt = [
    {
        "role": "system",
        "content": (
            "You are a math expert. You are given a question and you need to solve it step by step. "
            "Reasoning step by step before any tool call. "
            "You should use the `calc_gsm8k_reward` tool after step by step solving the question, "
            "before generate final answer at least once and refine your answer if necessary. "
            "Put your final answer in the format of `#### <answer>`."
        ),
    },
    {
        "role": "user",
        "content": "Janet has 3 apples... Let's think step by step and output the final answer after `####`.",
    },
]

Example 2: Tool kwargs in extra_info

extra_info = {
    "split": "train",
    "index": 42,
    "answer": "The total is 3 + 4 = 7\n#### 7",
    "question": "Janet has 3 apples...",
    "need_tools_kwargs": True,
    "tools_kwargs": {
        "calc_gsm8k_reward": {
            "create_kwargs": {"ground_truth": "7"},
            # "execute_kwargs": {},
            # "calc_reward_kwargs": {},
            # "release_kwargs": {},
        },
    },
    "interaction_kwargs": {
        "query": "Janet has 3 apples...",
        "ground_truth": "7",
    },
}

Example 3: Full record after transformation

record = {
    "data_source": "openai/gsm8k",
    "prompt": [
        {"role": "system", "content": "You are a math expert..."},
        {"role": "user", "content": "Janet has 3 apples..."},
    ],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": "7"},
    "extra_info": {
        "split": "train",
        "index": 0,
        "answer": "The total is 3 + 4 = 7\n#### 7",
        "question": "Janet has 3 apples...",
        "need_tools_kwargs": True,
        "tools_kwargs": {"calc_gsm8k_reward": {"create_kwargs": {"ground_truth": "7"}}},
        "interaction_kwargs": {"query": "Janet has 3 apples...", "ground_truth": "7"},
    },
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment