Principle:Haosulab ManiSkill Digital Twin Construction
| Field | Value |
|---|---|
| Principle Name | Digital Twin Construction |
| Domain | Sim2Real |
| Overview | Building simulation environments that visually and physically replicate real-world workspaces |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
The Digital Twin Construction principle describes how ManiSkill creates simulation environments that closely replicate the visual appearance and physical layout of real-world robot workspaces. This is a foundational step in the sim-to-real transfer pipeline: the closer the simulation matches reality, the smaller the domain gap that policies must bridge when deployed on physical hardware.
Description
ManiSkill's digital twin approach is based on the SIMPLER framework (Simulation-to-real for Manipulation Policies with Large-scale Evaluation in Realistic environments). The key technique is greenscreen compositing, which replaces the background of rendered simulation images with real-world photographs taken from the same camera viewpoint.
The process works as follows:
- Scene setup: The digital twin environment loads the same robot model, objects, and workspace geometry as the real setup, positioning them to match the physical layout.
- Background overlay: A photograph of the real workspace (with the robot and manipulated objects removed via a green screen or manual masking) is loaded as the background overlay image.
- Segmentation-based compositing: During rendering, the environment produces both RGB and segmentation images. Objects that should not be greenscreened (the robot, the target objects) are identified by their segmentation IDs. The compositor replaces all pixels not belonging to these objects with the corresponding pixels from the overlay image.
- Result: The final observation image shows the real-world background with the simulation-rendered robot and objects overlaid. This dramatically reduces the visual domain gap for camera-based policies.
The greenscreen compositing supports three modes:
- background (default): Full greenscreen replacement of background pixels.
- debug: 50/50 opacity blend of simulation and overlay for visual inspection.
- none: No greenscreening (pure simulation rendering).
Usage
To create a digital twin environment, a task class inherits from BaseDigitalTwinEnv and specifies the overlay image paths:
class MyDigitalTwinTask(BaseDigitalTwinEnv):
def __init__(self, **kwargs):
self.rgb_overlay_paths = {
"camera_name": "path/to/greenscreen/image.png"
}
super().__init__(**kwargs)
def _load_scene(self, options: dict):
# Load objects as usual
self.cube = self._build_cube()
# Exclude robot and manipulated objects from greenscreen
self.remove_object_from_greenscreen(self.robot)
self.remove_object_from_greenscreen(self.cube)
Theoretical Basis
- Digital twin methodology: A digital twin is a virtual replica of a physical system that mirrors its geometry, physics, and (ideally) appearance. In robotics, digital twins enable safe training and evaluation before real-world deployment.
- Visual domain gap reduction: Policies trained on simulation images often fail in the real world due to differences in lighting, textures, and backgrounds. Greenscreen compositing directly addresses the background component of this gap by replacing it with real imagery, as demonstrated by the SIMPLER framework (Li et al., 2024).
- Greenscreen compositing: A technique borrowed from film and video production where a uniform-color background is replaced with a different image. In ManiSkill's case, segmentation masks (rather than color keying) are used to separate foreground objects from the background, which is more robust to lighting variations.
- Observation-space alignment: For sim-to-real transfer, the observation distributions in simulation and reality should overlap as much as possible. The digital twin approach ensures that visual observations from simulation approximate those from the real camera.