Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ARISE Initiative Robosuite GymWrapper

From Leeroopedia

Metadata

Property Value
Sources robosuite, Gymnasium
Domains Reinforcement_Learning, API_Compatibility
Last Updated 2026-02-15 12:00 GMT

Overview

Concrete adapter for wrapping robosuite environments into Gymnasium-compatible interfaces provided by the robosuite wrappers module.

Description

The GymWrapper class provides a concrete implementation of the Gymnasium Environment Wrapping adapter pattern. It inherits from both the robosuite Wrapper base class and the gym.Env interface, creating a dual-interface object that can be used seamlessly with both robosuite and Gymnasium ecosystems.

Key implementation details:

  • Observation Space Computation: The wrapper computes a Box observation_space by sampling the environment, selecting specified observation keys, flattening them into a 1D array, and creating a Box space with appropriate dimensions
  • Action Space Computation: The Box action_space is derived directly from the environment's action_spec bounds
  • Observation Flattening: The _flatten_obs() method concatenates selected observation keys (default: 'robot0_proprio-state' and 'object-state') into a single 1D numpy array
  • Gymnasium Reset Signature: The reset() method accepts optional seed and options parameters and returns a 2-tuple (obs, info)
  • Gymnasium Step Signature: The step() method returns a 5-tuple (obs, reward, terminated, truncated, info), splitting the legacy 'done' flag into separate termination conditions

The wrapper is designed to be transparent, adding minimal overhead while ensuring full compatibility with the Gymnasium API standard.

Usage

Wrap any robosuite environment to use with Gymnasium-compatible RL libraries such as:

  • Stable-Baselines3: For training PPO, SAC, TD3, and other standard RL algorithms
  • CleanRL: For single-file RL implementations with minimal dependencies
  • RLlib: For distributed RL training at scale
  • Custom Training Loops: Any code that expects the standard Gymnasium interface

The wrapper handles all interface translation automatically, allowing you to focus on algorithm development rather than environment compatibility.

Code Reference

Property Value
Source robosuite
File robosuite/wrappers/gym_wrapper.py
Lines L26-184
Import from robosuite.wrappers import GymWrapper

Constructor Signature

class GymWrapper:
    def __init__(self, env, keys=None, flatten_obs=True):
        """
        Initialize the Gymnasium wrapper for robosuite environments.

        Args:
            env (MujocoEnv): The environment to wrap
            keys (None or list of str): Observation keys to include. If None, defaults
                to proprio-state and object-state keys.
            flatten_obs (bool): Whether to flatten observation dict to 1d array.
                Defaults to True.
        """

Reset Method (L127-143)

def reset(self, seed=None, options=None):
    """
    Reset the environment to initial state.

    Args:
        seed (int, optional): Random seed for reproducibility
        options (dict, optional): Additional options for reset

    Returns:
        2-tuple: (np.array observations, dict info)
            - observations: Flattened observation array
            - info: Dictionary with auxiliary diagnostic information
    """

Step Method (L145-163)

def step(self, action):
    """
    Execute one timestep of the environment dynamics.

    Args:
        action (np.array): Action to take in the environment

    Returns:
        5-tuple: (np.array obs, float reward, bool terminated, bool truncated, dict info)
            - obs: Flattened observation array
            - reward: Reward signal from the environment
            - terminated: True if episode ended due to terminal state
            - truncated: True if episode ended due to time limit
            - info: Dictionary with auxiliary diagnostic information
    """

Close Method (L180-184)

def close(self):
    """
    Wrapper for calling underlying environment close function.
    Performs any necessary cleanup of environment resources.
    """

I/O Contract

Constructor Inputs

Parameter Type Required Default Description
env MujocoEnv Yes N/A The robosuite environment instance to wrap
keys list of str No ['robot0_proprio-state', 'object-state'] Observation keys to include in flattened observation
flatten_obs bool No True Whether to flatten OrderedDict observations to 1D array

Constructor Outputs

Attribute Type Description
observation_space gym.spaces.Box Box space defining valid observations (continuous, 1D)
action_space gym.spaces.Box Box space defining valid actions (continuous)

reset() Method

Inputs:

Parameter Type Required Description
seed int No Random seed for environment reproducibility
options dict No Additional environment-specific reset options

Outputs:

Element Type Description
observations np.array Flattened observation array matching observation_space
info dict Diagnostic information (typically empty on reset)

step() Method

Inputs:

Parameter Type Required Description
action np.array Yes Action array matching action_space dimensions

Outputs:

Element Type Description
obs np.array Next observation after taking action
reward float Reward signal for the transition
terminated bool True if episode reached terminal state (success/failure)
truncated bool True if episode ended due to time/step limit
info dict Diagnostic information (may include success flag, etc.)

Usage Examples

Example 1: Basic GymWrapper Usage with Random Actions

import robosuite as suite
from robosuite.wrappers import GymWrapper

# Create robosuite environment
env = suite.make(
    env_name="Lift",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

# Wrap with GymWrapper
gym_env = GymWrapper(env)

# Print space information
print(f"Observation space: {gym_env.observation_space}")
print(f"Action space: {gym_env.action_space}")

# Run episode with random actions
obs, info = gym_env.reset(seed=42)
print(f"Initial observation shape: {obs.shape}")

for step in range(100):
    # Sample random action
    action = gym_env.action_space.sample()

    # Execute action
    obs, reward, terminated, truncated, info = gym_env.step(action)

    # Check if episode ended
    if terminated or truncated:
        print(f"Episode ended at step {step}")
        obs, info = gym_env.reset()

gym_env.close()

Example 2: Using with Standard RL Training Pattern

import numpy as np
import robosuite as suite
from robosuite.wrappers import GymWrapper

# Create and wrap environment
env = suite.make(
    env_name="Stack",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
    horizon=200,
)

gym_env = GymWrapper(
    env,
    keys=["robot0_proprio-state", "object-state"],
    flatten_obs=True
)

# Training loop compatible with any Gymnasium-based RL library
num_episodes = 10
for episode in range(num_episodes):
    obs, info = gym_env.reset(seed=episode)
    episode_reward = 0
    episode_length = 0

    done = False
    while not done:
        # In practice, replace with policy network
        action = gym_env.action_space.sample()

        # Standard Gymnasium step
        obs, reward, terminated, truncated, info = gym_env.step(action)

        episode_reward += reward
        episode_length += 1

        # Episode ends if terminated OR truncated
        done = terminated or truncated

        # Check for success (if provided in info)
        if done and info.get("success", False):
            print(f"Episode {episode}: SUCCESS!")

    print(f"Episode {episode}: reward={episode_reward:.2f}, length={episode_length}")

gym_env.close()

Example 3: Integration with Stable-Baselines3

import robosuite as suite
from robosuite.wrappers import GymWrapper
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env

# Create wrapped environment
env = suite.make(
    env_name="Door",
    robots="Panda",
    has_renderer=False,
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

gym_env = GymWrapper(env)

# Verify Gymnasium compatibility
check_env(gym_env)

# Train PPO agent (works seamlessly due to GymWrapper)
model = PPO("MlpPolicy", gym_env, verbose=1)
model.learn(total_timesteps=10000)

# Evaluate trained agent
obs, info = gym_env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = gym_env.step(action)

    if terminated or truncated:
        obs, info = gym_env.reset()

gym_env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment