Principle:ARISE Initiative Robosuite Gymnasium Environment Wrapping

Metadata

Property	Value
Sources	robosuite, Gymnasium
Domains	Reinforcement_Learning, API_Compatibility
Last Updated	2026-02-15 12:00 GMT

Overview

Adapter pattern for wrapping robosuite environments to conform to the Gymnasium (OpenAI Gym) interface standard for compatibility with RL libraries.

Description

Robosuite environments use a custom interface with OrderedDict observations and custom reset/step method signatures that differ from standard reinforcement learning frameworks. Most modern RL libraries, including Stable-Baselines3, CleanRL, and RLlib, expect the standardized Gymnasium API (the successor to OpenAI Gym).

The GymWrapper provides an adaptation layer that transforms robosuite environments to provide:

Box observation_space and action_space: Standard Gymnasium space definitions derived from environment specifications
Flattened numpy array observations: Converts OrderedDict observations to 1D numpy arrays for compatibility
Gymnasium 5-tuple step return: Returns (obs, reward, terminated, truncated, info) instead of the legacy 4-tuple format
Seed-aware reset: Supports the modern reset(seed, options) signature for reproducibility

This adapter enables direct use of robosuite environments with any Gymnasium-compatible RL algorithm without modification to the underlying environment or the training code.

Usage

Use the Gymnasium Environment Wrapping pattern whenever:

Training RL policies with standard RL libraries that expect the Gymnasium interface
Integrating robosuite environments into existing RL training pipelines
Benchmarking robosuite tasks against other Gymnasium environments
Requiring compatibility with Gymnasium-based tools for logging, monitoring, or evaluation

Theoretical Basis

Adapter Pattern

The Gymnasium Environment Wrapping implements the Adapter Pattern from software design, which allows incompatible interfaces to work together. The adapter (GymWrapper) acts as a translator between:

Adaptee: The robosuite MujocoEnv with its custom interface
Target: The Gymnasium API expected by RL libraries
Adapter: The GymWrapper that implements the Target interface while delegating to the Adaptee

Observation Space Transformation

Robosuite environments return observations as OrderedDict objects with multiple keys (e.g., 'robot0_proprio-state', 'object-state', 'image'). The wrapper performs:

Key Selection: Choose which observation keys to include (default: proprio-state and object-state)
Flattening: Concatenate all selected observation arrays into a single 1D numpy array
Space Definition: Create a Box space with appropriate bounds and shape

The observation space is computed by:

# Concatenate selected observation keys
obs_arrays = [obs_dict[key].flatten() for key in selected_keys]
flat_obs = np.concatenate(obs_arrays)

# Define Box space with infinite bounds
low = np.full(flat_obs.shape, -np.inf)
high = np.full(flat_obs.shape, np.inf)
observation_space = Box(low=low, high=high, dtype=np.float32)

Action Space Transformation

The action space is derived from the environment's action specification bounds:

# Extract action bounds from environment
action_spec = env.action_spec
low = action_spec[0]  # Lower bounds
high = action_spec[1]  # Upper bounds

# Create Box action space
action_space = Box(low=low, high=high, dtype=np.float32)

Step Return Conversion

Robosuite environments return a 4-tuple from step(): (observation, reward, done, info). The Gymnasium API expects a 5-tuple that separates termination conditions:

# Robosuite step (legacy 4-tuple)
obs_dict, reward, done, info = env.step(action)

# Convert to Gymnasium 5-tuple
flat_obs = flatten_obs(obs_dict)
terminated = done  # Episode ended due to terminal state
truncated = False  # Episode ended due to time limit (handled separately)

return flat_obs, reward, terminated, truncated, info

The distinction between terminated and truncated is important for proper value function learning:

terminated: Episode ended because a terminal state was reached (success or failure)
truncated: Episode ended due to time limit (not a true terminal state)

Pseudocode

class GymWrapper(Adapter):
    def __init__(self, robosuite_env, observation_keys, flatten_obs):
        self.env = robosuite_env
        self.keys = observation_keys or default_keys
        self.flatten = flatten_obs

        # Compute adapted spaces
        self.observation_space = self._compute_obs_space()
        self.action_space = self._compute_action_space()

    def _compute_obs_space(self):
        # Get sample observation
        sample_obs = self.env.reset()

        # Flatten selected keys
        flat_obs = self._flatten_obs(sample_obs)

        # Return Box space
        return Box(low=-inf, high=inf, shape=flat_obs.shape)

    def _flatten_obs(self, obs_dict):
        # Concatenate selected observation keys
        arrays = [obs_dict[key].flatten() for key in self.keys]
        return np.concatenate(arrays)

    def reset(self, seed=None, options=None):
        # Set seed if provided
        if seed is not None:
            self.env.seed(seed)

        # Reset environment
        obs_dict = self.env.reset()

        # Return Gymnasium format: (obs, info)
        return self._flatten_obs(obs_dict), {}

    def step(self, action):
        # Call robosuite step (4-tuple)
        obs_dict, reward, done, info = self.env.step(action)

        # Convert to Gymnasium format (5-tuple)
        obs = self._flatten_obs(obs_dict)
        terminated = done
        truncated = False

        return obs, reward, terminated, truncated, info

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment