Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Volcengine Verl SandboxTool

From Leeroopedia


Knowledge Sources
Domains Agentic_RL, Tool_Use, Code_Execution
Last Updated 2026-02-07 18:00 GMT

Overview

Concrete tool for sandboxed code execution in agent loop training, extending the BaseTool interface provided by the verl framework.

Description

The SandboxTool class implements a code execution tool for use in verl's agent loop (agentic RL) training. It extends BaseTool and provides:

  • code_interpreter() — An async method that sends Python code to a sandbox execution service via HTTP POST and returns stdout+stderr
  • get_openai_tool_schema() — Generates an OpenAI-compatible function calling schema using HuggingFace transformers' get_json_schema utility
  • execute() — The main entry point called by the agent loop, which extracts code from markdown code blocks, ensures the last expression is printed, and calls the sandbox

The tool uses regex pattern matching (```py...```) to extract code from model-generated responses before execution.

Usage

Use this class when building an agent loop tutorial or training setup that requires sandboxed code execution as a tool. It serves as the reference implementation for custom tools extending BaseTool.

Code Reference

Source Location

Signature

class SandboxTool(BaseTool):
    def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
        """
        Args:
            config: Tool configuration dict, must contain 'sandbox_fusion_url'.
            tool_schema: OpenAI function tool schema definition.
        """

    async def code_interpreter(self, code: str) -> str:
        """Execute code in the sandbox and return stdout+stderr."""

    def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
        """Generate OpenAI-compatible function calling schema."""

    async def execute(
        self, instance_id: str, parameters: dict, **kwargs
    ) -> tuple[str, float, dict]:
        """Execute tool call from agent loop.

        Args:
            instance_id: Unique identifier for this execution instance.
            parameters: Dict with 'code' key containing the code to execute.

        Returns:
            Tuple of (ToolResponse, reward_float, info_dict).
        """

Import

from examples.tutorial.agent_loop_get_started.sandbox import SandboxTool

I/O Contract

Inputs

Name Type Required Description
config dict Yes Must contain "sandbox_fusion_url" pointing to the execution service
tool_schema OpenAIFunctionToolSchema Yes Schema definition for tool registration
parameters["code"] str Yes Python code string (may contain markdown code blocks)
instance_id str Yes Unique execution instance identifier

Outputs

Name Type Description
ToolResponse ToolResponse Contains text output (stdout + stderr from sandbox)
reward float Always 0.0 (reward is computed separately by reward manager)
info dict Empty dict (no additional metadata)

Usage Examples

Registering SandboxTool in Tool Config

from examples.tutorial.agent_loop_get_started.sandbox import SandboxTool
from verl.tools.base_tool import OpenAIFunctionToolSchema

# Create tool instance
config = {"sandbox_fusion_url": "http://localhost:8080/execute"}
schema = OpenAIFunctionToolSchema(
    type="function",
    function={"name": "code_interpreter", "description": "Execute Python code"}
)
tool = SandboxTool(config=config, tool_schema=schema)

# Get the auto-generated OpenAI schema
openai_schema = tool.get_openai_tool_schema()

Using in Agent Loop

import asyncio

# Execute code via the tool
response, reward, info = asyncio.run(
    tool.execute(
        instance_id="test-001",
        parameters={"code": "```py\n2 + 2\n```"}
    )
)
# response.text contains "4\n" (stdout from sandbox)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment