Principle:ARISE Initiative Robosuite Gymnasium Environment Wrapping
Metadata
| Property | Value |
|---|---|
| Sources | robosuite, Gymnasium |
| Domains | Reinforcement_Learning, API_Compatibility |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
Adapter pattern for wrapping robosuite environments to conform to the Gymnasium (OpenAI Gym) interface standard for compatibility with RL libraries.
Description
Robosuite environments use a custom interface with OrderedDict observations and custom reset/step method signatures that differ from standard reinforcement learning frameworks. Most modern RL libraries, including Stable-Baselines3, CleanRL, and RLlib, expect the standardized Gymnasium API (the successor to OpenAI Gym).
The GymWrapper provides an adaptation layer that transforms robosuite environments to provide:
- Box observation_space and action_space: Standard Gymnasium space definitions derived from environment specifications
- Flattened numpy array observations: Converts OrderedDict observations to 1D numpy arrays for compatibility
- Gymnasium 5-tuple step return: Returns (obs, reward, terminated, truncated, info) instead of the legacy 4-tuple format
- Seed-aware reset: Supports the modern reset(seed, options) signature for reproducibility
This adapter enables direct use of robosuite environments with any Gymnasium-compatible RL algorithm without modification to the underlying environment or the training code.
Usage
Use the Gymnasium Environment Wrapping pattern whenever:
- Training RL policies with standard RL libraries that expect the Gymnasium interface
- Integrating robosuite environments into existing RL training pipelines
- Benchmarking robosuite tasks against other Gymnasium environments
- Requiring compatibility with Gymnasium-based tools for logging, monitoring, or evaluation
Theoretical Basis
Adapter Pattern
The Gymnasium Environment Wrapping implements the Adapter Pattern from software design, which allows incompatible interfaces to work together. The adapter (GymWrapper) acts as a translator between:
- Adaptee: The robosuite MujocoEnv with its custom interface
- Target: The Gymnasium API expected by RL libraries
- Adapter: The GymWrapper that implements the Target interface while delegating to the Adaptee
Observation Space Transformation
Robosuite environments return observations as OrderedDict objects with multiple keys (e.g., 'robot0_proprio-state', 'object-state', 'image'). The wrapper performs:
- Key Selection: Choose which observation keys to include (default: proprio-state and object-state)
- Flattening: Concatenate all selected observation arrays into a single 1D numpy array
- Space Definition: Create a Box space with appropriate bounds and shape
The observation space is computed by:
# Concatenate selected observation keys
obs_arrays = [obs_dict[key].flatten() for key in selected_keys]
flat_obs = np.concatenate(obs_arrays)
# Define Box space with infinite bounds
low = np.full(flat_obs.shape, -np.inf)
high = np.full(flat_obs.shape, np.inf)
observation_space = Box(low=low, high=high, dtype=np.float32)
Action Space Transformation
The action space is derived from the environment's action specification bounds:
# Extract action bounds from environment
action_spec = env.action_spec
low = action_spec[0] # Lower bounds
high = action_spec[1] # Upper bounds
# Create Box action space
action_space = Box(low=low, high=high, dtype=np.float32)
Step Return Conversion
Robosuite environments return a 4-tuple from step(): (observation, reward, done, info). The Gymnasium API expects a 5-tuple that separates termination conditions:
# Robosuite step (legacy 4-tuple)
obs_dict, reward, done, info = env.step(action)
# Convert to Gymnasium 5-tuple
flat_obs = flatten_obs(obs_dict)
terminated = done # Episode ended due to terminal state
truncated = False # Episode ended due to time limit (handled separately)
return flat_obs, reward, terminated, truncated, info
The distinction between terminated and truncated is important for proper value function learning:
- terminated: Episode ended because a terminal state was reached (success or failure)
- truncated: Episode ended due to time limit (not a true terminal state)
Pseudocode
class GymWrapper(Adapter):
def __init__(self, robosuite_env, observation_keys, flatten_obs):
self.env = robosuite_env
self.keys = observation_keys or default_keys
self.flatten = flatten_obs
# Compute adapted spaces
self.observation_space = self._compute_obs_space()
self.action_space = self._compute_action_space()
def _compute_obs_space(self):
# Get sample observation
sample_obs = self.env.reset()
# Flatten selected keys
flat_obs = self._flatten_obs(sample_obs)
# Return Box space
return Box(low=-inf, high=inf, shape=flat_obs.shape)
def _flatten_obs(self, obs_dict):
# Concatenate selected observation keys
arrays = [obs_dict[key].flatten() for key in self.keys]
return np.concatenate(arrays)
def reset(self, seed=None, options=None):
# Set seed if provided
if seed is not None:
self.env.seed(seed)
# Reset environment
obs_dict = self.env.reset()
# Return Gymnasium format: (obs, info)
return self._flatten_obs(obs_dict), {}
def step(self, action):
# Call robosuite step (4-tuple)
obs_dict, reward, done, info = self.env.step(action)
# Convert to Gymnasium format (5-tuple)
obs = self._flatten_obs(obs_dict)
terminated = done
truncated = False
return obs, reward, terminated, truncated, info