Principle:Volcengine Verl Multi Turn Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Agentic_AI, Tool_Use |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
The process of preparing datasets for multi-turn agentic RL training, including system prompts with tool instructions and tool-calling keyword arguments embedded in data rows.
Description
Multi-Turn Data Preparation extends standard RL data preparation with additional fields needed for agentic training where the model interacts with external tools across multiple conversation turns.
Key additions beyond standard RL data:
- System prompt: Instructions telling the model about available tools and expected output format
- tools_kwargs: Configuration for tool instantiation, including per-row parameters (e.g., ground truth for a calculator tool)
- interaction_kwargs: Parameters controlling the multi-turn interaction (e.g., max turns)
The tool configuration is embedded directly in each data row, allowing different rows to have different tool setups.
Usage
Use multi-turn data preparation when training models for:
- Tool-calling capabilities (calculator, code execution, search)
- Multi-step reasoning with external feedback
- Agentic workflows where the model must decide when to use tools
Theoretical Basis
Multi-turn data extends the standard schema with tool configuration:
# Abstract multi-turn data preparation
for row in dataset:
prompt = [
{"role": "system", "content": tool_use_instructions},
{"role": "user", "content": row["question"]}
]
extra_info = {
"need_tools_kwargs": True,
"tools_kwargs": {
"tool_name": {
"create_kwargs": {"ground_truth": extract_answer(row)}
}
},
"interaction_kwargs": {"max_turns": 5}
}