Principle:Haosulab ManiSkill Scene Object Loading

Field	Value
Page Type	Principle
Title	ManiSkill Scene and Object Loading
Domain	Simulation, Robotics, Environment_Design, Physics_Simulation
Related Implementation	Implementation:Haosulab_ManiSkill_ActorBuilder_TableSceneBuilder
Date	2026-02-15
Repository	Haosulab/ManiSkill

Overview

Description

Building a simulation scene in ManiSkill involves constructing a collection of physics objects -- actors (rigid bodies), articulations (jointed multi-body objects like robots or drawers), and scene templates (pre-built compositions of common furniture and fixtures). The task developer populates the scene by overriding the _load_scene() method of BaseEnv, which is called during environment reconfiguration.

ManiSkill provides two complementary approaches for loading objects:

Builder pattern (ActorBuilder): A fluent API inherited from SAPIEN that lets you programmatically construct actors by chaining calls to add collision shapes, visual shapes, and physics properties. This is used for custom procedural objects (cubes, spheres, targets) and for loading mesh-based assets from files. The builder produces an Actor object that is tracked across all parallel sub-scenes.

Scene builder pattern (SceneBuilder / TableSceneBuilder): Pre-built scene templates that encapsulate common workspace setups. TableSceneBuilder, for example, loads a table mesh, positions it so that z=0 is at the table surface, builds a ground plane, and provides an initialize() method that sets reasonable robot initial configurations for many supported robots. Scene builders handle the boilerplate of workspace construction so that task developers can focus on task-specific objects.

Each actor or articulation built into the scene is automatically replicated across all parallel sub-scenes (GPU environments). The ActorBuilder supports selective instantiation via set_scene_idxs(), which restricts the actor to specific parallel environments -- useful for tasks that load different assets in different environments.

Objects in ManiSkill have three physics body types:

dynamic: Objects that respond to forces and can be moved by the robot or other objects. Used for manipulation targets (cubes, bottles, etc.).
kinematic: Objects that can be programmatically moved but are not affected by physics forces. Used for goal markers, movable platforms, and animated obstacles.
static: Objects that are completely fixed in place. Used for tables, walls, and other immovable fixtures.

Usage

Scene and object loading is performed inside the _load_scene() method of a custom task. This method is called during reconfiguration (typically at the first reset() call). The developer:

Optionally instantiates a SceneBuilder (e.g., TableSceneBuilder) and calls its .build() method to create the workspace.
Creates additional actors via self.scene.create_actor_builder() or convenience functions like actors.build_cube().
Loads articulated objects (robots are loaded separately in _load_agent(); task articulations like faucets or drawers are loaded here).
Stores references to key objects as instance attributes (e.g., self.obj, self.goal_region) for later use in initialization, reward, and observation methods.

All objects must have unique names within the scene. Setting reasonable initial poses is recommended to prevent physics instabilities during GPU simulation setup.

Theoretical Basis

The scene construction approach in ManiSkill is grounded in several design patterns and simulation concepts:

Builder design pattern: The ActorBuilder follows the classic builder pattern from object-oriented design. Rather than constructing a complex object in a single constructor call, the builder accumulates configuration (collision shapes, visual shapes, physics type) through a sequence of method calls, then produces the final object via a terminal .build() call. This enables flexible, readable object construction.

Scene graph architecture: The simulation scene is organized as a hierarchical structure where the global ManiSkillScene manages multiple SAPIEN sub-scenes (one per parallel environment in GPU simulation, or a single one in CPU simulation). Actors and articulations are tracked at the scene level, enabling batched operations across all environments.

URDF/MJCF asset loading: Articulated objects (robots, mechanisms) are loaded from standard robotics description formats -- URDF (Unified Robot Description Format) and MJCF (MuJoCo XML). These formats describe link geometries, joint types, and physical properties in a declarative way, decoupling asset authoring from simulation code.

Physics body type taxonomy: The distinction between dynamic, kinematic, and static objects is fundamental to rigid-body physics simulation (PhysX). Correct classification affects simulation performance (static objects are optimized away from the solver) and correctness (kinematic objects must be moved programmatically).

Template method pattern: The SceneBuilder classes use the template method pattern -- build() creates the scene structure, and initialize() sets per-episode configurations. Subclasses can override these to customize workspace geometry while inheriting robot-specific initialization logic.

Related Pages

Implementation:Haosulab_ManiSkill_ActorBuilder_TableSceneBuilder -- Concrete builder implementations
Principle:Haosulab_ManiSkill_Environment_Registration -- Registering the environment before loading scenes
Principle:Haosulab_ManiSkill_Episode_Initialization -- Randomizing object poses after scene loading
Principle:Haosulab_ManiSkill_Observation_Definition -- Observing the loaded scene
Heuristic:Haosulab_ManiSkill_Initial_Pose_Performance

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment