Heuristic:ARISE Initiative Robosuite Observation Key Selection
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Robotics_Simulation |
| Last Updated | 2026-02-15 07:00 GMT |
Overview
When using GymWrapper for RL training, carefully select observation keys to balance information content against observation space size; default keys include proprioceptive state and object state but exclude images unless explicitly configured.
Description
The `GymWrapper` converts robosuite's dictionary-based observations into a flat vector (or dictionary) compatible with Gymnasium-based RL libraries. The `keys` parameter determines which observation modalities are included. By default, if no keys are specified, the wrapper automatically includes `object-state` (if object observations are enabled) and `robot{idx}_proprio-state` for each robot. Image observations are only included if `use_camera_obs=True` is set on the base environment, in which case camera images are added as `{camera_name}_image` keys. The wrapper can also preserve the dictionary structure by setting `flatten_obs=False`.
Usage
Use this heuristic when:
- Setting up a GymWrapper for RL training and need to decide which observation keys to include
- Experiencing unexpectedly large observation spaces or memory issues during training
- Debugging RL agents that fail to learn and suspecting observation quality issues
- Combining proprioceptive and visual observations
The Insight (Rule of Thumb)
- Action: For state-based RL, use the default keys (no explicit `keys` argument): this automatically includes `robot{idx}_proprio-state` and `object-state`.
- Action: For image-based RL, set `use_camera_obs=True` on the base environment and include camera keys explicitly: `keys=["robot0_proprio-state", "agentview_image"]`.
- Action: To concatenate images with state, set `macros.CONCATENATE_IMAGES = True`. By default, images are not concatenated to save memory.
- Value: The default observation includes robot proprioception (joint positions, velocities, gripper state) and object state (object positions, orientations). This is typically sufficient for tabletop manipulation tasks.
- Trade-off: Including image observations dramatically increases observation size and memory usage. The `CONCATENATE_IMAGES` macro is `False` by default specifically because image concatenation is expensive.
- Warning: If `use_object_obs=False` and no explicit keys are provided, the wrapper will have an empty key list, producing zero-dimensional observations.
Reasoning
RL algorithms have different observation requirements. Model-free algorithms like SAC and PPO work well with compact proprioceptive observations for simpler tasks. Vision-based methods (e.g., DrQ, RAD) need image observations but these significantly increase training time and memory requirements. The GymWrapper's default behavior is designed for the common case of state-based RL training.
The `all-{name}` camera convention (documented in `robot_env.py`) automatically includes all robot-specific cameras, which is useful for multi-view training but can explode the observation space.
Code evidence from `robosuite/wrappers/gym_wrapper.py:55-66`:
if keys is None:
keys = []
# Add object obs if requested
if self.env.use_object_obs:
keys += ["object-state"]
# Add image obs if requested
if self.env.use_camera_obs:
keys += [f"{cam_name}_image" for cam_name in self.env.camera_names]
# Iterate over all robots to add to state
for idx in range(len(self.env.robots)):
keys += ["robot{}_proprio-state".format(idx)]
Code evidence from `robosuite/macros.py:30-33`:
# Image concatenation
# In general, observations are concatenated together by modality. However, image observations are expensive
# memory-wise, so we skip concatenating all images together by default, unless this flag is set to True
CONCATENATE_IMAGES = False