Implementation:ARISE Initiative Robosuite GymWrapper

Metadata

Property	Value
Sources	robosuite, Gymnasium
Domains	Reinforcement_Learning, API_Compatibility
Last Updated	2026-02-15 12:00 GMT

Overview

Concrete adapter for wrapping robosuite environments into Gymnasium-compatible interfaces provided by the robosuite wrappers module.

Description

The GymWrapper class provides a concrete implementation of the Gymnasium Environment Wrapping adapter pattern. It inherits from both the robosuite Wrapper base class and the gym.Env interface, creating a dual-interface object that can be used seamlessly with both robosuite and Gymnasium ecosystems.

Key implementation details:

Observation Space Computation: The wrapper computes a Box observation_space by sampling the environment, selecting specified observation keys, flattening them into a 1D array, and creating a Box space with appropriate dimensions
Action Space Computation: The Box action_space is derived directly from the environment's action_spec bounds
Observation Flattening: The _flatten_obs() method concatenates selected observation keys (default: 'robot0_proprio-state' and 'object-state') into a single 1D numpy array
Gymnasium Reset Signature: The reset() method accepts optional seed and options parameters and returns a 2-tuple (obs, info)
Gymnasium Step Signature: The step() method returns a 5-tuple (obs, reward, terminated, truncated, info), splitting the legacy 'done' flag into separate termination conditions

The wrapper is designed to be transparent, adding minimal overhead while ensuring full compatibility with the Gymnasium API standard.

Usage

Wrap any robosuite environment to use with Gymnasium-compatible RL libraries such as:

Stable-Baselines3: For training PPO, SAC, TD3, and other standard RL algorithms
CleanRL: For single-file RL implementations with minimal dependencies
RLlib: For distributed RL training at scale
Custom Training Loops: Any code that expects the standard Gymnasium interface

The wrapper handles all interface translation automatically, allowing you to focus on algorithm development rather than environment compatibility.

Code Reference

Property	Value
Source	robosuite
File	robosuite/wrappers/gym_wrapper.py
Lines	L26-184
Import	`from robosuite.wrappers import GymWrapper`

Constructor Signature

class GymWrapper:
    def __init__(self, env, keys=None, flatten_obs=True):
        """
        Initialize the Gymnasium wrapper for robosuite environments.

        Args:
            env (MujocoEnv): The environment to wrap
            keys (None or list of str): Observation keys to include. If None, defaults
                to proprio-state and object-state keys.
            flatten_obs (bool): Whether to flatten observation dict to 1d array.
                Defaults to True.
        """

Reset Method (L127-143)

def reset(self, seed=None, options=None):
    """
    Reset the environment to initial state.

    Args:
        seed (int, optional): Random seed for reproducibility
        options (dict, optional): Additional options for reset

    Returns:
        2-tuple: (np.array observations, dict info)
            - observations: Flattened observation array
            - info: Dictionary with auxiliary diagnostic information
    """

Step Method (L145-163)

def step(self, action):
    """
    Execute one timestep of the environment dynamics.

    Args:
        action (np.array): Action to take in the environment

    Returns:
        5-tuple: (np.array obs, float reward, bool terminated, bool truncated, dict info)
            - obs: Flattened observation array
            - reward: Reward signal from the environment
            - terminated: True if episode ended due to terminal state
            - truncated: True if episode ended due to time limit
            - info: Dictionary with auxiliary diagnostic information
    """

Close Method (L180-184)

def close(self):
    """
    Wrapper for calling underlying environment close function.
    Performs any necessary cleanup of environment resources.
    """

I/O Contract

Constructor Inputs

Parameter	Type	Required	Default	Description
env	MujocoEnv	Yes	N/A	The robosuite environment instance to wrap
keys	list of str	No	['robot0_proprio-state', 'object-state']	Observation keys to include in flattened observation
flatten_obs	bool	No	True	Whether to flatten OrderedDict observations to 1D array

Constructor Outputs

Attribute	Type	Description
observation_space	gym.spaces.Box	Box space defining valid observations (continuous, 1D)
action_space	gym.spaces.Box	Box space defining valid actions (continuous)

reset() Method

Inputs:

Parameter	Type	Required	Description
seed	int	No	Random seed for environment reproducibility
options	dict	No	Additional environment-specific reset options

Outputs:

Element	Type	Description
observations	np.array	Flattened observation array matching observation_space
info	dict	Diagnostic information (typically empty on reset)

step() Method

Inputs:

Parameter	Type	Required	Description
action	np.array	Yes	Action array matching action_space dimensions

Outputs:

Element	Type	Description
obs	np.array	Next observation after taking action
reward	float	Reward signal for the transition
terminated	bool	True if episode reached terminal state (success/failure)
truncated	bool	True if episode ended due to time/step limit
info	dict	Diagnostic information (may include success flag, etc.)

Usage Examples

Example 1: Basic GymWrapper Usage with Random Actions

import robosuite as suite
from robosuite.wrappers import GymWrapper

# Create robosuite environment
env = suite.make(
    env_name="Lift",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

# Wrap with GymWrapper
gym_env = GymWrapper(env)

# Print space information
print(f"Observation space: {gym_env.observation_space}")
print(f"Action space: {gym_env.action_space}")

# Run episode with random actions
obs, info = gym_env.reset(seed=42)
print(f"Initial observation shape: {obs.shape}")

for step in range(100):
    # Sample random action
    action = gym_env.action_space.sample()

    # Execute action
    obs, reward, terminated, truncated, info = gym_env.step(action)

    # Check if episode ended
    if terminated or truncated:
        print(f"Episode ended at step {step}")
        obs, info = gym_env.reset()

gym_env.close()

Example 2: Using with Standard RL Training Pattern

import numpy as np
import robosuite as suite
from robosuite.wrappers import GymWrapper

# Create and wrap environment
env = suite.make(
    env_name="Stack",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
    horizon=200,
)

gym_env = GymWrapper(
    env,
    keys=["robot0_proprio-state", "object-state"],
    flatten_obs=True
)

# Training loop compatible with any Gymnasium-based RL library
num_episodes = 10
for episode in range(num_episodes):
    obs, info = gym_env.reset(seed=episode)
    episode_reward = 0
    episode_length = 0

    done = False
    while not done:
        # In practice, replace with policy network
        action = gym_env.action_space.sample()

        # Standard Gymnasium step
        obs, reward, terminated, truncated, info = gym_env.step(action)

        episode_reward += reward
        episode_length += 1

        # Episode ends if terminated OR truncated
        done = terminated or truncated

        # Check for success (if provided in info)
        if done and info.get("success", False):
            print(f"Episode {episode}: SUCCESS!")

    print(f"Episode {episode}: reward={episode_reward:.2f}, length={episode_length}")

gym_env.close()

Example 3: Integration with Stable-Baselines3

import robosuite as suite
from robosuite.wrappers import GymWrapper
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env

# Create wrapped environment
env = suite.make(
    env_name="Door",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

gym_env = GymWrapper(env)

# Verify Gymnasium compatibility
check_env(gym_env)

# Train PPO agent (works seamlessly due to GymWrapper)
model = PPO("MlpPolicy", gym_env, verbose=1)
model.learn(total_timesteps=10000)

# Evaluate trained agent
obs, info = gym_env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = gym_env.step(action)

    if terminated or truncated:
        obs, info = gym_env.reset()

gym_env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment