Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ARISE Initiative Robosuite MujocoEnv Reset And Step

From Leeroopedia
Property Value
Sources robosuite
Domains Robotics_Simulation, Reinforcement_Learning
Last Updated 2026-02-15 12:00 GMT

Overview

Concrete simulation loop methods for environment reset, action execution, and rendering provided by the robosuite MujocoEnv base class.

Description

This implementation documents the core MujocoEnv methods that realize the simulation loop principle in robosuite environments:

  • reset(): Initializes or reinitializes the simulation state and returns the initial observations
  • step(action): Advances the simulation by one timestep using the provided action and returns (observations, reward, done flag, info dictionary)
  • action_spec property: Returns the valid action bounds as a tuple (low, high) of numpy arrays
  • render(): Displays on-screen visualization of the current simulation state

These methods form the complete interface for interacting with any robosuite environment and follow the standard OpenAI Gym/Gymnasium API conventions.

Usage

These methods are called on any robosuite environment instance after creation via robosuite.make(). The typical usage pattern is:

  1. Create environment using suite.make()
  2. Call reset() to initialize
  3. Query action_spec to determine valid action ranges
  4. Loop: call step(action) and optionally render()
  5. Repeat until done flag is True

Code Reference

Source Information

  • Repository: robosuite
  • Primary Module: robosuite/environments/base.py
  • Base Class: MujocoEnv
  • Additional Module: robosuite/environments/robot_env.py (for action_spec)

Method Signatures

reset() method (robosuite/environments/base.py, Lines 277-347):

def reset(self):
    """
    Resets simulation.

    Returns:
        OrderedDict: Environment observation space after reset occurs
    """
    # Implementation performs:
    # - Reset mujoco simulation state
    # - Reset environment variables
    # - Reset robot controllers
    # - Reinitialize object positions
    # - Clear episode history
    # - Return initial observations

step() method (robosuite/environments/base.py, Lines 467-521):

def step(self, action):
    """
    Takes a step in simulation with control command @action.

    Args:
        action (np.array): Action to execute within the environment

    Returns:
        4-tuple:
            - OrderedDict: observations from the environment
            - float: reward from the environment
            - bool: whether the current episode is completed or not
            - dict: misc information
    """
    # Implementation performs:
    # - Validate action dimensions
    # - Apply action through robot controller
    # - Step mujoco physics simulation
    # - Compute reward based on task
    # - Check termination conditions
    # - Collect observations
    # - Return (obs, reward, done, info)

action_spec property (robosuite/environments/robot_env.py, Lines 270-285):

@property
def action_spec(self):
    """
    Action space (low, high) for this environment

    Returns:
        2-tuple:
            - np.array: minimum (low) action values
            - np.array: maximum (high) action values
    """
    # Returns valid bounds for action sampling
    # Shape depends on robot controller and DOF

render() method (robosuite/environments/base.py):

def render(self):
    """
    Renders to an on-screen window.

    Displays the current simulation state using the MuJoCo viewer.
    Requires has_renderer=True during environment creation.
    """

Import and Usage

import robosuite

# Create environment instance
env = robosuite.make(
    env_name="Lift",           # Task name
    robots="Panda",            # Robot model
    has_renderer=True,         # Enable on-screen rendering
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

# Methods are now accessible on env instance:
# env.reset()
# env.step(action)
# env.action_spec
# env.render()

I/O Contract

Input Specifications

Method Parameter Type Required Description
step() action np.array Yes Action vector with dimensions matching action_spec bounds. Each element must be within [low[i], high[i]] range.
reset() none - - No parameters required
render() none - - No parameters required

Output Specifications

reset() outputs:

Return Type Description
OrderedDict Observation dictionary with keys including:
  • robot0_proprio-state: Robot joint positions, velocities, gripper state
  • object-state: Object positions, orientations, velocities
  • robot0_eef_pos: End-effector position
  • robot0_eef_quat: End-effector orientation (quaternion)
  • Optional: Camera images if use_camera_obs=True

step() outputs:

Return Position Type Description
1st OrderedDict Observations with same structure as reset()
2nd float Scalar reward signal for the current timestep
3rd bool Terminal flag (True if episode should end)
4th dict Additional info (may include success flags, diagnostic data)

action_spec outputs:

Return Position Type Description
1st np.array Lower bounds for each action dimension
2nd np.array Upper bounds for each action dimension

render() outputs:

Return Type Description
None Side effect: displays visualization in on-screen window

Usage Examples

Complete Simulation Loop Example

import robosuite as suite
import numpy as np

# 1. Create environment instance
env = suite.make(
    env_name="Lift",
    robots="Panda",
    has_renderer=True,
    has_offscreen_renderer=False,
    use_camera_obs=False,
    control_freq=20,
)

# 2. Query action space bounds
low, high = env.action_spec
print(f"Action space dimension: {len(low)}")
print(f"Action bounds: [{low[0]:.2f}, {high[0]:.2f}]")

# 3. Reset environment to initial state
obs = env.reset()
print(f"Observation keys: {list(obs.keys())}")
print(f"Robot proprioceptive state shape: {obs['robot0_proprio-state'].shape}")

# 4. Execute simulation loop with random actions
done = False
total_reward = 0
step_count = 0

while not done:
    # 5. Sample random action within bounds
    action = np.random.uniform(low, high)

    # 6. Execute action and get results
    obs, reward, done, info = env.step(action)

    # 7. Accumulate metrics
    total_reward += reward
    step_count += 1

    # 8. Render visualization
    env.render()

    # 9. Check termination (example: max steps)
    if step_count >= 200:
        done = True

# 10. Report episode results
print(f"Episode finished after {step_count} steps")
print(f"Total reward: {total_reward:.3f}")

Policy Evaluation Example

import robosuite as suite
import numpy as np

def simple_policy(obs):
    """Example: move gripper down and close"""
    # Action typically: [dx, dy, dz, droll, dpitch, dyaw, gripper]
    action = np.array([0.0, 0.0, -0.1, 0.0, 0.0, 0.0, 1.0])
    return action

env = suite.make("Lift", robots="Panda", has_renderer=True)

# Run multiple episodes
num_episodes = 5
for episode in range(num_episodes):
    obs = env.reset()
    done = False
    episode_reward = 0

    while not done:
        # Use policy instead of random actions
        action = simple_policy(obs)
        obs, reward, done, info = env.step(action)
        episode_reward += reward
        env.render()

    print(f"Episode {episode+1}: Reward = {episode_reward:.3f}")

Action Space Inspection Example

import robosuite as suite

# Create environment
env = suite.make("PickPlace", robots=["Panda", "Sawyer"])

# Inspect action space
low, high = env.action_spec
action_dim = len(low)

print(f"Action space dimensionality: {action_dim}")
print("\nAction bounds per dimension:")
for i in range(action_dim):
    print(f"  Dimension {i}: [{low[i]:6.2f}, {high[i]:6.2f}]")

# Safe random action sampling
safe_action = np.random.uniform(low, high)
print(f"\nSample valid action: {safe_action}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment