Implementation:Haosulab ManiSkill Get Obs Extra CameraConfig

Field	Value
Page Type	Implementation (Pattern Doc)
Title	ManiSkill _get_obs_extra and CameraConfig
Domain	Simulation, Robotics, Environment_Design, Computer_Vision
Related Principle	Principle:Haosulab_ManiSkill_Observation_Definition
Source Files	`mani_skill/envs/sapien_env.py` (L558-560), `mani_skill/sensors/camera.py` (L32-62)
Date	2026-02-15
Repository	Haosulab/ManiSkill

Overview

Description

This document describes two APIs for defining observations in a custom ManiSkill task:

_get_obs_extra(): A method on BaseEnv that task developers override to inject task-specific observation data (goal positions, relative poses, grasp indicators) into the observation dictionary.

CameraConfig: A dataclass used to configure camera sensors for visual observation modes. Camera configurations are returned by the _default_sensor_configs and _default_human_render_camera_configs properties.

Together, these two mechanisms define what the agent observes: _get_obs_extra() provides the semantic layer (what task-relevant facts are exposed), while CameraConfig provides the perceptual layer (what visual data is captured).

Usage

from mani_skill.sensors.camera import CameraConfig

Override _get_obs_extra() in your BaseEnv subclass and _default_sensor_configs property to configure cameras.

Code Reference

_get_obs_extra Interface (sapien_env.py L558-560)

def _get_obs_extra(self, info: dict) -> dict:
    """Get task-relevant extra observations. Usually defined on a task by task basis.

    Args:
        info (dict): The info dictionary from self.evaluate(). Contains
            success/fail flags and any other computed data.

    Returns:
        dict: Mapping of observation names to torch.Tensor values.
            Each tensor should have shape (num_envs, ...).
            Returns empty dict by default.
    """
    return dict()

CameraConfig Dataclass (camera.py L32-62)

@dataclass
class CameraConfig(BaseSensorConfig):

    uid: str
    """Unique id of the camera."""

    pose: Pose
    """Pose of the camera (sapien.Pose or Pose object)."""

    width: int
    """Width of the rendered image in pixels."""

    height: int
    """Height of the rendered image in pixels."""

    fov: float = None
    """Field of view in radians. Either fov or intrinsic must be given."""

    near: float = 0.01
    """Near clipping plane distance."""

    far: float = 100
    """Far clipping plane distance."""

    intrinsic: Array = None
    """Camera intrinsics matrix (3x3). Either fov or intrinsic must be given."""

    entity_uid: Optional[str] = None
    """UID of the entity to mount the camera on. Used by agent classes for
    defining mounted cameras (e.g., wrist cameras)."""

    mount: Union[Actor, Link] = None
    """The Actor or Link to mount the camera on. The camera's global pose
    becomes mount.pose * local_pose."""

    shader_pack: Optional[str] = "minimal"
    """Shader for rendering. Options: 'minimal' (fastest), 'default', 'rt' (ray-tracing)."""

    shader_config: Optional[ShaderConfig] = None
    """Explicit shader config. Overrides shader_pack if given."""

Sensor Config Properties (sapien_env.py)

@property
def _default_sensor_configs(self) -> Union[
    BaseSensorConfig, Sequence[BaseSensorConfig], dict[str, BaseSensorConfig]
]:
    """Return sensor configurations for agent observation cameras.
    Override to add task-specific cameras. Returns list, dict, or single config."""
    return []

@property
def _default_human_render_camera_configs(self) -> Union[
    CameraConfig, Sequence[CameraConfig], dict[str, CameraConfig]
]:
    """Return camera configurations for human rendering (render_mode='rgb_array').
    Typically higher resolution than sensor cameras."""
    return []

I/O Contract

_get_obs_extra

Parameter	Type	Description
`info`	`dict`	Info dictionary from `self.evaluate()`. Contains keys like `"success"`, `"fail"`, and any task-specific computed data.

Returns: dict mapping string keys to torch.Tensor values. Each tensor must have batch dimension self.num_envs as the first axis.

Note: Use self.obs_mode_struct.use_state to conditionally include ground-truth information only in state-based observation modes. This prevents leaking privileged state info in visual observation modes.

CameraConfig

Field	Type	Required	Default	Description
`uid`	`str`	Yes	--	Unique camera identifier
`pose`	`Pose` or `sapien.Pose`	Yes	--	Camera pose in world frame (or local frame if mounted)
`width`	`int`	Yes	--	Image width in pixels
`height`	`int`	Yes	--	Image height in pixels
`fov`	`float`	Conditional	`None`	Field of view in radians. Required if `intrinsic` is not set.
`near`	`float`	No	`0.01`	Near clipping plane
`far`	`float`	No	`100`	Far clipping plane
`intrinsic`	`Array`	Conditional	`None`	3x3 intrinsics matrix. Required if `fov` is not set.
`entity_uid`	`str`	No	`None`	Entity UID for mounting (agent camera use)
`mount`	`Actor` or `Link`	No	`None`	Object to mount camera on
`shader_pack`	`str`	No	`"minimal"`	Rendering shader: "minimal", "default", or "rt"

Constraint: Exactly one of fov or intrinsic must be provided (not both, not neither).

Usage Examples

Task-Specific Observations with State/Visual Branching

def _get_obs_extra(self, info: dict):
    # Always include TCP pose (available in all modes)
    obs = dict(
        tcp_pose=self.agent.tcp.pose.raw_pose,
    )
    if self.obs_mode_struct.use_state:
        # Only include ground-truth object/goal info in state modes
        obs.update(
            goal_pos=self.goal_region.pose.p,
            obj_pose=self.obj.pose.raw_pose,
        )
    return obs

Configuring a Sensor Camera

from mani_skill.sensors.camera import CameraConfig
from mani_skill.utils import sapien_utils

@property
def _default_sensor_configs(self):
    # Create a camera looking at the workspace
    pose = sapien_utils.look_at(eye=[0.3, 0, 0.6], target=[-0.1, 0, 0.1])
    return [
        CameraConfig(
            "base_camera",
            pose=pose,
            width=128,
            height=128,
            fov=np.pi / 2,
            near=0.01,
            far=100,
        )
    ]

Configuring a Human Render Camera

@property
def _default_human_render_camera_configs(self):
    # Higher resolution camera for video recording
    pose = sapien_utils.look_at([0.6, 0.7, 0.6], [0.0, 0.0, 0.35])
    return CameraConfig(
        "render_camera",
        pose=pose,
        width=512,
        height=512,
        fov=1,
        near=0.01,
        far=100,
    )

Using Info Dict to Avoid Recomputation

def evaluate(self):
    # Compute grasp state (expensive)
    is_grasped = self.agent.is_grasping(self.obj)
    obj_to_goal = torch.linalg.norm(
        self.obj.pose.p - self.goal_pose.p, axis=1
    )
    success = (obj_to_goal < 0.05) & is_grasped
    return {
        "success": success,
        "is_grasped": is_grasped,
        "obj_to_goal_dist": obj_to_goal,
    }

def _get_obs_extra(self, info: dict):
    obs = dict(tcp_pose=self.agent.tcp.pose.raw_pose)
    if self.obs_mode_struct.use_state:
        obs["obj_pose"] = self.obj.pose.raw_pose
        obs["goal_pose"] = self.goal_pose.raw_pose
        # Reuse computed data from evaluate() via info
        obs["is_grasped"] = info["is_grasped"].float().unsqueeze(-1)
    return obs

Multiple Cameras (Sensor + Wrist)

@property
def _default_sensor_configs(self):
    overhead = CameraConfig(
        "overhead_cam",
        pose=sapien_utils.look_at([0, 0, 1.0], [0, 0, 0]),
        width=128,
        height=128,
        fov=np.pi / 3,
    )
    # Wrist camera (mounted on robot hand link)
    wrist = CameraConfig(
        "wrist_cam",
        pose=sapien.Pose(p=[0, 0, 0.05]),
        width=84,
        height=84,
        fov=np.pi / 2,
        entity_uid="panda_hand",  # mounts on this link of the robot
    )
    return [overhead, wrist]

Related Pages

Principle:Haosulab_ManiSkill_Observation_Definition -- The principle this implements
Implementation:Haosulab_ManiSkill_Initialize_Episode_Pattern -- Initialization determines observable state
Implementation:Haosulab_ManiSkill_Evaluate_Dense_Reward -- Evaluate produces the info dict used by _get_obs_extra
Heuristic:Haosulab_ManiSkill_Rendering_Memory_Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment