Principle:OpenHands OpenHands Sandbox Command Execution
| Knowledge Sources | |
|---|---|
| Domains | Cloud_Infrastructure, Runtime_Management |
| Last Updated | 2026-02-11 21:00 GMT |
Overview
Sandbox Command Execution is the principle of executing shell commands and code inside a cloud sandbox environment, where agent actions are serialized, transmitted to the sandbox, and results are returned as observations.
Description
The core purpose of a cloud sandbox is to execute agent-generated commands in an isolated environment. OpenHands uses the command pattern to decouple action generation (by the agent) from action execution (inside the sandbox). Each agent action (e.g., run a shell command, execute Python code, edit a file) is represented as a serializable action object. The runtime transmits this action to the sandbox, where it is executed, and the result is captured as an observation object returned to the agent.
There are two distinct execution strategies across the four providers:
Strategy 1: Action Server HTTP Dispatch (Daytona, Modal, Runloop)
These three runtimes use the action execution server running inside the sandbox. The runtime serializes the action object into JSON, sends it as an HTTP POST request to the server's /execute_action endpoint, and deserializes the response into an observation. This approach inherits the methods from ActionExecutionClient, which provides run() and run_ipython() methods that handle the HTTP communication.
Strategy 2: Direct SDK Execution (E2B)
E2B bypasses the action server entirely. Instead, it uses the E2B SDK's native execution API through the E2BBox wrapper class. Shell commands are executed via E2BBox.execute(), which calls the E2B sandbox's process API directly and returns the exit code and output. IPython code is executed via E2BRuntime.run_ipython(), which writes code to a temporary file and runs it through the sandbox's IPython kernel.
Both strategies produce the same result from the agent's perspective: an observation object containing the command output, exit code, and any error information.
Usage
Sandbox Command Execution is the primary runtime operation. It is invoked every time the agent produces an action that requires execution in the sandbox, which occurs repeatedly throughout an agent session. The orchestrator calls runtime.run(action) or runtime.run_ipython(action) without needing to know which execution strategy is used.
Theoretical Basis
The command pattern separates the requester of an action from its executor. Actions are first-class objects that can be serialized, transmitted, and executed remotely.
COMMAND PATTERN FLOW:
Agent -> Action Object -> Runtime -> Sandbox -> Observation Object -> Agent
STRATEGY 1: HTTP Dispatch (Daytona, Modal, Runloop)
Agent creates CmdRunAction("ls -la")
Runtime serializes action to JSON
Runtime sends HTTP POST /execute_action with JSON body
Action server inside sandbox deserializes action
Action server executes "ls -la" in sandbox shell
Action server serializes CmdOutputObservation(output, exit_code)
Runtime receives HTTP response and deserializes observation
Agent receives CmdOutputObservation
STRATEGY 2: Direct SDK Execution (E2B)
Agent creates CmdRunAction("ls -la")
Runtime calls E2BBox.execute("ls -la", timeout)
E2BBox calls e2b_sandbox.process.start("ls -la")
E2BBox waits for process completion
E2BBox returns (exit_code, stdout + stderr)
Runtime wraps result in CmdOutputObservation
Agent receives CmdOutputObservation
COMMON CONTRACT:
INPUT: Action object (CmdRunAction, IPythonRunCellAction, ...)
OUTPUT: Observation object (CmdOutputObservation, IPythonRunCellObservation, ...)
INVARIANT: exit_code == 0 indicates success
INVARIANT: output contains combined stdout/stderr
The key design insight is that the two strategies are interchangeable at the runtime interface level. The agent and orchestrator never need to distinguish between HTTP-dispatched and SDK-executed commands.