Implementation:Farama Foundation Gymnasium PusherEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Pusher v5 MuJoCo environment provided by Gymnasium.
Description
The Pusher v5 environment is a multi-jointed robot arm similar to a human arm. The goal is to move a target cylinder (object) to a goal position using the robot's end effector (fingertip). The robot consists of shoulder, elbow, forearm, and wrist joints (7 actuated joints). The observation includes joint positions (7), joint velocities (7), fingertip position (3), object position (3), and goal position (3) for 23 elements. The reward is: reward_dist + reward_ctrl + reward_near, with configurable weights for each term. The v5 version fixes object density to be higher than air, computes reward after the physics step, and provides configurable reward weights.
Usage
Use this environment for benchmarking RL algorithms on a robotic manipulation task. The Pusher tests an agent's ability to coordinate a multi-joint arm to push an object to a target location. The v5 version is recommended for new research with its bug fixes and configurable reward weights.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/pusher_v5.py
Signature
class PusherEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "pusher_v5.xml",
frame_skip: int = 5,
default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
reward_near_weight: float = 0.5,
reward_dist_weight: float = 1,
reward_control_weight: float = 0.1,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Pusher-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (7,) | Yes | Torques applied to shoulder, elbow, forearm, and wrist joints, range [-2, 2] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (23,) | State vector: qpos[:7], qvel[:7], tips_arm position (3), object position (3), goal position (3) |
| reward | float | reward_dist + reward_ctrl + reward_near (weighted by configurable weights) |
| terminated | bool | Always False (Pusher never terminates) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper, default 100 timesteps) |
| info | dict | Contains reward_dist, reward_ctrl, reward_near |
Usage Examples
import gymnasium as gym
env = gym.make("Pusher-v5")
observation, info = env.reset(seed=42)
for _ in range(100):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()