Implementation:Volcengine Verl Multi Turn Data Preprocessing
| Field | Value |
|---|---|
| Knowledge Sources | API Doc (verl data preprocessing) |
| Domains | Data Preprocessing, Multi-Turn Conversation, Tool Use, Math Reasoning |
| Last Updated | 2026-02-07 |
Overview
Description
This implementation preprocesses the GSM8K dataset into a multi-turn, tool-augmented Parquet format for reinforcement learning training with verl. Unlike the standard single-turn GSM8K preprocessing, this variant constructs a two-message prompt containing a system message (defining the model as a math expert that uses tools) and a user message (with the question). The extra_info field includes need_tools_kwargs=True along with tools_kwargs and interaction_kwargs dictionaries that configure a calc_gsm8k_reward tool for multi-turn interaction during rollout.
The extract_solution(solution_str) function is identical to the single-turn version, using regex r"#### (\-?[0-9\.\,]+)". The make_map_fn(split) closure differs by including a system prompt that instructs the model to reason step by step, use the reward calculation tool, and refine its answer before producing a final #### <answer> output.
Usage
Execute the script directly from the command line:
python examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_save_dir ~/data/gsm8k_multiturn
Code Reference
| Attribute | Detail |
|---|---|
| Source Location | examples/data_preprocess/gsm8k_multiturn_w_tool.py, Lines 29-129
|
| Signature (extract) | def extract_solution(solution_str) -> str
|
| Signature (map fn) | def make_map_fn(split) -> Callable[[dict, int], dict]
|
| Import | Script executed directly: python examples/data_preprocess/gsm8k_multiturn_w_tool.py
|
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
--local_save_dir |
str |
Directory where output Parquet files are saved (default: ~/data/gsm8k)
|
--local_dataset_path |
str (optional) |
Local path to a pre-downloaded GSM8K dataset |
--hdfs_dir |
str (optional) |
HDFS directory for remote copy of output files |
| HuggingFace dataset | openai/gsm8k |
Source dataset with question and answer columns
|
Outputs
| Output | Type | Description |
|---|---|---|
train.parquet |
Parquet file | Training split with multi-turn schema |
test.parquet |
Parquet file | Test split with multi-turn schema |
Output column schema:
| Column | Type | Description |
|---|---|---|
data_source |
str |
Always "openai/gsm8k"
|
prompt |
list[dict] |
Two-message prompt: system message (tool-use instructions) + user message (question) |
ability |
str |
Always "math"
|
reward_model |
dict |
{"style": "rule", "ground_truth": solution}
|
extra_info |
dict |
Contains split, index, answer, question, need_tools_kwargs, tools_kwargs, interaction_kwargs
|
Usage Examples
Example 1: Multi-turn prompt structure
# The prompt field for multi-turn tool use:
prompt = [
{
"role": "system",
"content": (
"You are a math expert. You are given a question and you need to solve it step by step. "
"Reasoning step by step before any tool call. "
"You should use the `calc_gsm8k_reward` tool after step by step solving the question, "
"before generate final answer at least once and refine your answer if necessary. "
"Put your final answer in the format of `#### <answer>`."
),
},
{
"role": "user",
"content": "Janet has 3 apples... Let's think step by step and output the final answer after `####`.",
},
]
Example 2: Tool kwargs in extra_info
extra_info = {
"split": "train",
"index": 42,
"answer": "The total is 3 + 4 = 7\n#### 7",
"question": "Janet has 3 apples...",
"need_tools_kwargs": True,
"tools_kwargs": {
"calc_gsm8k_reward": {
"create_kwargs": {"ground_truth": "7"},
# "execute_kwargs": {},
# "calc_reward_kwargs": {},
# "release_kwargs": {},
},
},
"interaction_kwargs": {
"query": "Janet has 3 apples...",
"ground_truth": "7",
},
}
Example 3: Full record after transformation
record = {
"data_source": "openai/gsm8k",
"prompt": [
{"role": "system", "content": "You are a math expert..."},
{"role": "user", "content": "Janet has 3 apples..."},
],
"ability": "math",
"reward_model": {"style": "rule", "ground_truth": "7"},
"extra_info": {
"split": "train",
"index": 0,
"answer": "The total is 3 + 4 = 7\n#### 7",
"question": "Janet has 3 apples...",
"need_tools_kwargs": True,
"tools_kwargs": {"calc_gsm8k_reward": {"create_kwargs": {"ground_truth": "7"}}},
"interaction_kwargs": {"query": "Janet has 3 apples...", "ground_truth": "7"},
},
}
Related Pages
- Principle:Volcengine_Verl_Multi_Turn_Data_Preparation
- examples/data_preprocess/gsm8k_multiturn_w_tool.py -- Source script
- Implementation:Volcengine_Verl_GSM8K_Data_Preprocessing -- Single-turn GSM8K preprocessing
- Implementation:Volcengine_Verl_Dataset_To_Parquet -- Parquet export wrapper