Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium PusherEnv V5

From Leeroopedia
Revision as of 12:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Farama_Foundation_Gymnasium_PusherEnv_V5.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Pusher v5 MuJoCo environment provided by Gymnasium.

Description

The Pusher v5 environment is a multi-jointed robot arm similar to a human arm. The goal is to move a target cylinder (object) to a goal position using the robot's end effector (fingertip). The robot consists of shoulder, elbow, forearm, and wrist joints (7 actuated joints). The observation includes joint positions (7), joint velocities (7), fingertip position (3), object position (3), and goal position (3) for 23 elements. The reward is: reward_dist + reward_ctrl + reward_near, with configurable weights for each term. The v5 version fixes object density to be higher than air, computes reward after the physics step, and provides configurable reward weights.

Usage

Use this environment for benchmarking RL algorithms on a robotic manipulation task. The Pusher tests an agent's ability to coordinate a multi-joint arm to push an object to a target location. The v5 version is recommended for new research with its bug fixes and configurable reward weights.

Code Reference

Source Location

Signature

class PusherEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "pusher_v5.xml",
        frame_skip: int = 5,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        reward_near_weight: float = 0.5,
        reward_dist_weight: float = 1,
        reward_control_weight: float = 0.1,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Pusher-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (7,) Yes Torques applied to shoulder, elbow, forearm, and wrist joints, range [-2, 2]

Outputs

Name Type Description
observation np.ndarray (23,) State vector: qpos[:7], qvel[:7], tips_arm position (3), object position (3), goal position (3)
reward float reward_dist + reward_ctrl + reward_near (weighted by configurable weights)
terminated bool Always False (Pusher never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper, default 100 timesteps)
info dict Contains reward_dist, reward_ctrl, reward_near

Usage Examples

import gymnasium as gym

env = gym.make("Pusher-v5")
observation, info = env.reset(seed=42)

for _ in range(100):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment