Heuristic:ARISE Initiative Robosuite Observation Key Selection

Knowledge Sources	robosuite GymWrapper Key Selection
Domains	Reinforcement_Learning, Robotics_Simulation
Last Updated	2026-02-15 07:00 GMT

Overview

When using GymWrapper for RL training, carefully select observation keys to balance information content against observation space size; default keys include proprioceptive state and object state but exclude images unless explicitly configured.

Description

The `GymWrapper` converts robosuite's dictionary-based observations into a flat vector (or dictionary) compatible with Gymnasium-based RL libraries. The `keys` parameter determines which observation modalities are included. By default, if no keys are specified, the wrapper automatically includes `object-state` (if object observations are enabled) and `robot{idx}_proprio-state` for each robot. Image observations are only included if `use_camera_obs=True` is set on the base environment, in which case camera images are added as `{camera_name}_image` keys. The wrapper can also preserve the dictionary structure by setting `flatten_obs=False`.

Usage

Use this heuristic when:

Setting up a GymWrapper for RL training and need to decide which observation keys to include
Experiencing unexpectedly large observation spaces or memory issues during training
Debugging RL agents that fail to learn and suspecting observation quality issues
Combining proprioceptive and visual observations

The Insight (Rule of Thumb)

Action: For state-based RL, use the default keys (no explicit `keys` argument): this automatically includes `robot{idx}_proprio-state` and `object-state`.
Action: For image-based RL, set `use_camera_obs=True` on the base environment and include camera keys explicitly: `keys=["robot0_proprio-state", "agentview_image"]`.
Action: To concatenate images with state, set `macros.CONCATENATE_IMAGES = True`. By default, images are not concatenated to save memory.
Value: The default observation includes robot proprioception (joint positions, velocities, gripper state) and object state (object positions, orientations). This is typically sufficient for tabletop manipulation tasks.
Trade-off: Including image observations dramatically increases observation size and memory usage. The `CONCATENATE_IMAGES` macro is `False` by default specifically because image concatenation is expensive.
Warning: If `use_object_obs=False` and no explicit keys are provided, the wrapper will have an empty key list, producing zero-dimensional observations.

Reasoning

RL algorithms have different observation requirements. Model-free algorithms like SAC and PPO work well with compact proprioceptive observations for simpler tasks. Vision-based methods (e.g., DrQ, RAD) need image observations but these significantly increase training time and memory requirements. The GymWrapper's default behavior is designed for the common case of state-based RL training.

The `all-{name}` camera convention (documented in `robot_env.py`) automatically includes all robot-specific cameras, which is useful for multi-view training but can explode the observation space.

Code evidence from `robosuite/wrappers/gym_wrapper.py:55-66`:

if keys is None:
    keys = []
    # Add object obs if requested
    if self.env.use_object_obs:
        keys += ["object-state"]
    # Add image obs if requested
    if self.env.use_camera_obs:
        keys += [f"{cam_name}_image" for cam_name in self.env.camera_names]
    # Iterate over all robots to add to state
    for idx in range(len(self.env.robots)):
        keys += ["robot{}_proprio-state".format(idx)]

Code evidence from `robosuite/macros.py:30-33`:

# Image concatenation
# In general, observations are concatenated together by modality. However, image observations are expensive
# memory-wise, so we skip concatenating all images together by default, unless this flag is set to True
CONCATENATE_IMAGES = False

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment