Principle:Volcengine Verl Sandbox Code Execution
| Knowledge Sources | |
|---|---|
| Domains | Agentic_RL, Tool_Use, Code_Execution |
| Last Updated | 2026-02-07 18:00 GMT |
Overview
A tool integration pattern that enables LLM agents to execute generated code in a sandboxed environment during reinforcement learning training.
Description
Sandbox Code Execution is a pattern used in agentic RL training where the model generates code as part of its reasoning process, and that code is executed in an isolated sandbox to produce results. The pattern involves:
- Code extraction — Parsing the model's text output to find code blocks (typically in markdown fences)
- Sandbox submission — Sending the extracted code to an external execution service via HTTP
- Result capture — Collecting stdout and stderr from the execution and returning them to the agent loop
- Safety isolation — The code runs in a remote sandbox, not on the training host
This pattern is essential for training agents that solve math or programming problems by writing and testing code iteratively.
Usage
Use this principle when building agentic RL training pipelines where the model needs to execute code as a tool during multi-turn rollouts. It requires an external sandbox execution service (e.g., Sandbox Fusion) running at a configured URL.
Theoretical Basis
Pseudo-code Logic:
# Abstract sandbox execution pattern (NOT real implementation)
def execute_tool(model_output: str) -> str:
# 1. Extract code from model response
code = extract_code_blocks(model_output)
# 2. Submit to sandbox
result = sandbox_service.execute(code)
# 3. Return execution output
return result.stdout + result.stderr
The key design decision is that the tool returns raw execution output without interpreting it, allowing the model to reason about errors and iterate in subsequent turns.