Implementation:Volcengine Verl Multi Turn Data Preprocessing

Field	Value
Knowledge Sources	API Doc (verl data preprocessing)
Domains	Data Preprocessing, Multi-Turn Conversation, Tool Use, Math Reasoning
Last Updated	2026-02-07

Overview

Description

This implementation preprocesses the GSM8K dataset into a multi-turn, tool-augmented Parquet format for reinforcement learning training with verl. Unlike the standard single-turn GSM8K preprocessing, this variant constructs a two-message prompt containing a system message (defining the model as a math expert that uses tools) and a user message (with the question). The extra_info field includes need_tools_kwargs=True along with tools_kwargs and interaction_kwargs dictionaries that configure a calc_gsm8k_reward tool for multi-turn interaction during rollout.

The extract_solution(solution_str) function is identical to the single-turn version, using regex r"#### (\-?[0-9\.\,]+)". The make_map_fn(split) closure differs by including a system prompt that instructs the model to reason step by step, use the reward calculation tool, and refine its answer before producing a final #### <answer> output.

Usage

Execute the script directly from the command line:

python examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_save_dir ~/data/gsm8k_multiturn

Code Reference

Attribute	Detail
Source Location	`examples/data_preprocess/gsm8k_multiturn_w_tool.py`, Lines 29-129
Signature (extract)	`def extract_solution(solution_str) -> str`
Signature (map fn)	`def make_map_fn(split) -> Callable[[dict, int], dict]`
Import	Script executed directly: `python examples/data_preprocess/gsm8k_multiturn_w_tool.py`

I/O Contract

Inputs

Parameter	Type	Description
`--local_save_dir`	`str`	Directory where output Parquet files are saved (default: `~/data/gsm8k`)
`--local_dataset_path`	`str` (optional)	Local path to a pre-downloaded GSM8K dataset
`--hdfs_dir`	`str` (optional)	HDFS directory for remote copy of output files
HuggingFace dataset	`openai/gsm8k`	Source dataset with `question` and `answer` columns

Outputs

Output	Type	Description
`train.parquet`	Parquet file	Training split with multi-turn schema
`test.parquet`	Parquet file	Test split with multi-turn schema

Output column schema:

Column	Type	Description
`data_source`	`str`	Always `"openai/gsm8k"`
`prompt`	`list[dict]`	Two-message prompt: system message (tool-use instructions) + user message (question)
`ability`	`str`	Always `"math"`
`reward_model`	`dict`	`{"style": "rule", "ground_truth": solution}`
`extra_info`	`dict`	Contains `split`, `index`, `answer`, `question`, `need_tools_kwargs`, `tools_kwargs`, `interaction_kwargs`

Usage Examples

Example 1: Multi-turn prompt structure

# The prompt field for multi-turn tool use:
prompt = [
    {
        "role": "system",
        "content": (
            "You are a math expert. You are given a question and you need to solve it step by step. "
            "Reasoning step by step before any tool call. "
            "You should use the `calc_gsm8k_reward` tool after step by step solving the question, "
            "before generate final answer at least once and refine your answer if necessary. "
            "Put your final answer in the format of `#### <answer>`."
        ),
    },
    {
        "role": "user",
        "content": "Janet has 3 apples... Let's think step by step and output the final answer after `####`.",
    },
]

Example 2: Tool kwargs in extra_info

extra_info = {
    "split": "train",
    "index": 42,
    "answer": "The total is 3 + 4 = 7\n#### 7",
    "question": "Janet has 3 apples...",
    "need_tools_kwargs": True,
    "tools_kwargs": {
        "calc_gsm8k_reward": {
            "create_kwargs": {"ground_truth": "7"},
            # "execute_kwargs": {},
            # "calc_reward_kwargs": {},
            # "release_kwargs": {},
        },
    },
    "interaction_kwargs": {
        "query": "Janet has 3 apples...",
        "ground_truth": "7",
    },
}

Example 3: Full record after transformation

record = {
    "data_source": "openai/gsm8k",
    "prompt": [
        {"role": "system", "content": "You are a math expert..."},
        {"role": "user", "content": "Janet has 3 apples..."},
    ],
    "ability": "math",
    "reward_model": {"style": "rule", "ground_truth": "7"},
    "extra_info": {
        "split": "train",
        "index": 0,
        "answer": "The total is 3 + 4 = 7\n#### 7",
        "question": "Janet has 3 apples...",
        "need_tools_kwargs": True,
        "tools_kwargs": {"calc_gsm8k_reward": {"create_kwargs": {"ground_truth": "7"}}},
        "interaction_kwargs": {"query": "Janet has 3 apples...", "ground_truth": "7"},
    },
}

Related Pages

Principle:Volcengine_Verl_Multi_Turn_Data_Preparation
examples/data_preprocess/gsm8k_multiturn_w_tool.py -- Source script
Implementation:Volcengine_Verl_GSM8K_Data_Preprocessing -- Single-turn GSM8K preprocessing
Implementation:Volcengine_Verl_Dataset_To_Parquet -- Parquet export wrapper

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment