Implementation:Farama Foundation Gymnasium PusherEnv V4

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, MuJoCo_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete implementation of the Pusher v4 MuJoCo environment provided by Gymnasium.

Description

The Pusher v4 environment is a multi-jointed robot arm that is similar to a human arm. The goal is to move a target cylinder (called object) to a goal position using the robot's end effector (called fingertip). The robot consists of shoulder, elbow, forearm, and wrist joints (7 actuated joints total). The observation includes joint positions (7), joint velocities (7), fingertip position (3), object position (3), and goal position (3) for a total of 23 elements. The reward is: reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near. The Pusher never terminates; episodes end through truncation (default 100 timesteps). Note: This version is only compatible with mujoco < 3.0.0.

Usage

Use this environment for reproducing results from papers that used Pusher-v4. For new projects, use Pusher-v5 which fixes the object density bug, computes reward after physics step, and is compatible with mujoco >= 3.0.0.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/mujoco/pusher_v4.py

Signature

class PusherEnv(MujocoEnv, utils.EzPickle):
    def __init__(self, **kwargs)

Import

import gymnasium as gym
env = gym.make("Pusher-v4")

I/O Contract

Inputs

Name	Type	Required	Description
action	np.ndarray (7,)	Yes	Torques applied to shoulder, elbow, forearm, and wrist joints, range [-2, 2]

Outputs

Name	Type	Description
observation	np.ndarray (23,)	State vector: qpos[:7], qvel[:7], tips_arm position (3), object position (3), goal position (3)
reward	float	reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near
terminated	bool	Always False (Pusher never terminates)
truncated	bool	Episode truncation (handled by TimeLimit wrapper, default 100 timesteps)
info	dict	Contains reward_dist, reward_ctrl

Usage Examples

import gymnasium as gym

env = gym.make("Pusher-v4")
observation, info = env.reset(seed=42)

for _ in range(100):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment