Implementation:ARISE Initiative Robosuite MujocoEnv Reset And Step
| Property | Value |
|---|---|
| Sources | robosuite |
| Domains | Robotics_Simulation, Reinforcement_Learning |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
Concrete simulation loop methods for environment reset, action execution, and rendering provided by the robosuite MujocoEnv base class.
Description
This implementation documents the core MujocoEnv methods that realize the simulation loop principle in robosuite environments:
- reset(): Initializes or reinitializes the simulation state and returns the initial observations
- step(action): Advances the simulation by one timestep using the provided action and returns (observations, reward, done flag, info dictionary)
- action_spec property: Returns the valid action bounds as a tuple (low, high) of numpy arrays
- render(): Displays on-screen visualization of the current simulation state
These methods form the complete interface for interacting with any robosuite environment and follow the standard OpenAI Gym/Gymnasium API conventions.
Usage
These methods are called on any robosuite environment instance after creation via robosuite.make(). The typical usage pattern is:
- Create environment using suite.make()
- Call reset() to initialize
- Query action_spec to determine valid action ranges
- Loop: call step(action) and optionally render()
- Repeat until done flag is True
Code Reference
Source Information
- Repository: robosuite
- Primary Module: robosuite/environments/base.py
- Base Class: MujocoEnv
- Additional Module: robosuite/environments/robot_env.py (for action_spec)
Method Signatures
reset() method (robosuite/environments/base.py, Lines 277-347):
def reset(self):
"""
Resets simulation.
Returns:
OrderedDict: Environment observation space after reset occurs
"""
# Implementation performs:
# - Reset mujoco simulation state
# - Reset environment variables
# - Reset robot controllers
# - Reinitialize object positions
# - Clear episode history
# - Return initial observations
step() method (robosuite/environments/base.py, Lines 467-521):
def step(self, action):
"""
Takes a step in simulation with control command @action.
Args:
action (np.array): Action to execute within the environment
Returns:
4-tuple:
- OrderedDict: observations from the environment
- float: reward from the environment
- bool: whether the current episode is completed or not
- dict: misc information
"""
# Implementation performs:
# - Validate action dimensions
# - Apply action through robot controller
# - Step mujoco physics simulation
# - Compute reward based on task
# - Check termination conditions
# - Collect observations
# - Return (obs, reward, done, info)
action_spec property (robosuite/environments/robot_env.py, Lines 270-285):
@property
def action_spec(self):
"""
Action space (low, high) for this environment
Returns:
2-tuple:
- np.array: minimum (low) action values
- np.array: maximum (high) action values
"""
# Returns valid bounds for action sampling
# Shape depends on robot controller and DOF
render() method (robosuite/environments/base.py):
def render(self):
"""
Renders to an on-screen window.
Displays the current simulation state using the MuJoCo viewer.
Requires has_renderer=True during environment creation.
"""
Import and Usage
import robosuite
# Create environment instance
env = robosuite.make(
env_name="Lift", # Task name
robots="Panda", # Robot model
has_renderer=True, # Enable on-screen rendering
has_offscreen_renderer=False,
use_camera_obs=False,
)
# Methods are now accessible on env instance:
# env.reset()
# env.step(action)
# env.action_spec
# env.render()
I/O Contract
Input Specifications
| Method | Parameter | Type | Required | Description |
|---|---|---|---|---|
| step() | action | np.array | Yes | Action vector with dimensions matching action_spec bounds. Each element must be within [low[i], high[i]] range. |
| reset() | none | - | - | No parameters required |
| render() | none | - | - | No parameters required |
Output Specifications
reset() outputs:
| Return Type | Description |
|---|---|
| OrderedDict | Observation dictionary with keys including:
|
step() outputs:
| Return Position | Type | Description |
|---|---|---|
| 1st | OrderedDict | Observations with same structure as reset() |
| 2nd | float | Scalar reward signal for the current timestep |
| 3rd | bool | Terminal flag (True if episode should end) |
| 4th | dict | Additional info (may include success flags, diagnostic data) |
action_spec outputs:
| Return Position | Type | Description |
|---|---|---|
| 1st | np.array | Lower bounds for each action dimension |
| 2nd | np.array | Upper bounds for each action dimension |
render() outputs:
| Return Type | Description |
|---|---|
| None | Side effect: displays visualization in on-screen window |
Usage Examples
Complete Simulation Loop Example
import robosuite as suite
import numpy as np
# 1. Create environment instance
env = suite.make(
env_name="Lift",
robots="Panda",
has_renderer=True,
has_offscreen_renderer=False,
use_camera_obs=False,
control_freq=20,
)
# 2. Query action space bounds
low, high = env.action_spec
print(f"Action space dimension: {len(low)}")
print(f"Action bounds: [{low[0]:.2f}, {high[0]:.2f}]")
# 3. Reset environment to initial state
obs = env.reset()
print(f"Observation keys: {list(obs.keys())}")
print(f"Robot proprioceptive state shape: {obs['robot0_proprio-state'].shape}")
# 4. Execute simulation loop with random actions
done = False
total_reward = 0
step_count = 0
while not done:
# 5. Sample random action within bounds
action = np.random.uniform(low, high)
# 6. Execute action and get results
obs, reward, done, info = env.step(action)
# 7. Accumulate metrics
total_reward += reward
step_count += 1
# 8. Render visualization
env.render()
# 9. Check termination (example: max steps)
if step_count >= 200:
done = True
# 10. Report episode results
print(f"Episode finished after {step_count} steps")
print(f"Total reward: {total_reward:.3f}")
Policy Evaluation Example
import robosuite as suite
import numpy as np
def simple_policy(obs):
"""Example: move gripper down and close"""
# Action typically: [dx, dy, dz, droll, dpitch, dyaw, gripper]
action = np.array([0.0, 0.0, -0.1, 0.0, 0.0, 0.0, 1.0])
return action
env = suite.make("Lift", robots="Panda", has_renderer=True)
# Run multiple episodes
num_episodes = 5
for episode in range(num_episodes):
obs = env.reset()
done = False
episode_reward = 0
while not done:
# Use policy instead of random actions
action = simple_policy(obs)
obs, reward, done, info = env.step(action)
episode_reward += reward
env.render()
print(f"Episode {episode+1}: Reward = {episode_reward:.3f}")
Action Space Inspection Example
import robosuite as suite
# Create environment
env = suite.make("PickPlace", robots=["Panda", "Sawyer"])
# Inspect action space
low, high = env.action_spec
action_dim = len(low)
print(f"Action space dimensionality: {action_dim}")
print("\nAction bounds per dimension:")
for i in range(action_dim):
print(f" Dimension {i}: [{low[i]:6.2f}, {high[i]:6.2f}]")
# Safe random action sampling
safe_action = np.random.uniform(low, high)
print(f"\nSample valid action: {safe_action}")