Implementation:Farama Foundation Gymnasium PusherEnv V4
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Pusher v4 MuJoCo environment provided by Gymnasium.
Description
The Pusher v4 environment is a multi-jointed robot arm that is similar to a human arm. The goal is to move a target cylinder (called object) to a goal position using the robot's end effector (called fingertip). The robot consists of shoulder, elbow, forearm, and wrist joints (7 actuated joints total). The observation includes joint positions (7), joint velocities (7), fingertip position (3), object position (3), and goal position (3) for a total of 23 elements. The reward is: reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near. The Pusher never terminates; episodes end through truncation (default 100 timesteps). Note: This version is only compatible with mujoco < 3.0.0.
Usage
Use this environment for reproducing results from papers that used Pusher-v4. For new projects, use Pusher-v5 which fixes the object density bug, computes reward after physics step, and is compatible with mujoco >= 3.0.0.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/pusher_v4.py
Signature
class PusherEnv(MujocoEnv, utils.EzPickle):
def __init__(self, **kwargs)
Import
import gymnasium as gym
env = gym.make("Pusher-v4")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (7,) | Yes | Torques applied to shoulder, elbow, forearm, and wrist joints, range [-2, 2] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (23,) | State vector: qpos[:7], qvel[:7], tips_arm position (3), object position (3), goal position (3) |
| reward | float | reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near |
| terminated | bool | Always False (Pusher never terminates) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper, default 100 timesteps) |
| info | dict | Contains reward_dist, reward_ctrl |
Usage Examples
import gymnasium as gym
env = gym.make("Pusher-v4")
observation, info = env.reset(seed=42)
for _ in range(100):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()