Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Volcengine Verl Multi Turn Data Preparation

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Agentic_AI, Tool_Use
Last Updated 2026-02-07 14:00 GMT

Overview

The process of preparing datasets for multi-turn agentic RL training, including system prompts with tool instructions and tool-calling keyword arguments embedded in data rows.

Description

Multi-Turn Data Preparation extends standard RL data preparation with additional fields needed for agentic training where the model interacts with external tools across multiple conversation turns.

Key additions beyond standard RL data:

  • System prompt: Instructions telling the model about available tools and expected output format
  • tools_kwargs: Configuration for tool instantiation, including per-row parameters (e.g., ground truth for a calculator tool)
  • interaction_kwargs: Parameters controlling the multi-turn interaction (e.g., max turns)

The tool configuration is embedded directly in each data row, allowing different rows to have different tool setups.

Usage

Use multi-turn data preparation when training models for:

  • Tool-calling capabilities (calculator, code execution, search)
  • Multi-step reasoning with external feedback
  • Agentic workflows where the model must decide when to use tools

Theoretical Basis

Multi-turn data extends the standard schema with tool configuration:

# Abstract multi-turn data preparation
for row in dataset:
    prompt = [
        {"role": "system", "content": tool_use_instructions},
        {"role": "user", "content": row["question"]}
    ]
    extra_info = {
        "need_tools_kwargs": True,
        "tools_kwargs": {
            "tool_name": {
                "create_kwargs": {"ground_truth": extract_answer(row)}
            }
        },
        "interaction_kwargs": {"max_turns": 5}
    }

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment