Principle:Haosulab ManiSkill Sim2Real Bridge
| Field | Value |
|---|---|
| Principle Name | Sim2Real Bridge |
| Domain | Sim2Real |
| Overview | Bridging simulation and physical robot through a unified environment interface |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
The Sim2Real Bridge principle describes how ManiSkill provides a unified Gymnasium-compatible environment interface that can seamlessly switch between simulation physics and real hardware I/O. This abstraction ensures that a trained policy can be deployed on a physical robot without modifying the inference code -- the same env.step(action) and env.reset() calls work identically whether backed by a physics simulator or by real motors and cameras.
Description
The bridge operates by wrapping a simulation environment and a real robot agent together into a single Sim2RealEnv object. The key design decisions are:
Observation Space Alignment
The bridge ensures that observations returned in the real environment have the same structure, shape, and data type as those from simulation:
- Proprioception: Joint positions and velocities are read from the real robot's motors and formatted identically to the simulated robot's
qposandqvel. - Visual data: Camera images from real sensors are center-cropped and resized to match the simulation camera resolution, then structured in the same dictionary format (
rgb,depth). - Extra observations: If the simulation environment computes additional features (e.g., TCP pose), the bridge delegates to the simulation's
_get_obs_extramethod, which uses forward kinematics from the (synced) simulated robot.
Action Space Alignment
The bridge reuses the simulation's controller to compute joint targets from high-level actions:
- The action is passed to the simulation agent's controller (e.g.,
pd_joint_pos). - The controller computes the desired drive targets or velocities.
- These targets are sent to the real robot's motors.
This ensures that the same action semantics (e.g., delta joint positions) are respected in both domains.
Control Frequency Enforcement
The bridge enforces the simulation's control frequency on the real robot by measuring elapsed wall-clock time between control steps and sleeping if the step completes faster than the control period. If a step takes longer than the control period, a warning is logged.
Wrapper Compatibility
Any Gymnasium wrappers applied to the simulation environment (e.g., observation normalization, action scaling) are automatically applied in the real environment too. The bridge achieves this by temporarily swapping the innermost wrapper's environment reference to point to a RealEnvStepReset object that delegates to real hardware.
Data Validation
On initialization (unless skip_data_checks=True), the bridge runs both the simulation and real environments once and compares the observations recursively to verify matching shapes and data types.
Usage
from mani_skill.envs.sim2real_env import Sim2RealEnv
sim2real_env = Sim2RealEnv(
sim_env=sim_env, # The wrapped simulation environment
agent=real_agent, # A BaseRealAgent instance controlling real hardware
control_freq=30, # Control frequency in Hz
)
obs, info = sim2real_env.reset()
for _ in range(100):
action = policy(obs)
obs, reward, terminated, truncated, info = sim2real_env.step(action)
sim2real_env.close()
Theoretical Basis
- Sim-to-real transfer: The bridge is the runtime component that makes zero-shot sim-to-real transfer possible. By ensuring identical observation and action spaces, the trained policy perceives the real world through the same "lens" it was trained with.
- Environment abstraction: The Gymnasium interface (
reset,step,observation_space,action_space) provides a clean abstraction boundary. The bridge implements this interface for real hardware, making the execution backend (simulation vs. real) an implementation detail invisible to the policy.
- Sensor preprocessing: Center-cropping and resizing real camera images to match simulation resolution is a standard practice in visual sim2real, minimizing distribution shift from resolution differences while preserving the spatial layout of the scene.
- Controller reuse: By reusing the simulation's PD controller to compute joint targets, the bridge avoids reimplementing control logic for the real robot and ensures that control dynamics (gains, limits, interpolation) are consistent.
- Real-time control: Enforcing the control frequency is critical for real robot safety and for matching the temporal dynamics that the policy was trained with. Deviations in control frequency can lead to different effective gains and motion behaviors.