Principle:Farama Foundation Gymnasium MuJoCo Locomotion
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Continuous_Control |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
High-fidelity 3D physics-based continuous control locomotion tasks provide standard benchmarks for evaluating reinforcement learning algorithms on articulated body coordination.
Description
MuJoCo (Multi-Joint dynamics with Contact) locomotion environments simulate articulated rigid body systems in three dimensions using the MuJoCo physics engine. These environments model robots ranging from simple single-joint pendulums to complex multi-legged creatures, where the agent must learn to coordinate continuous torque commands across many actuated joints to achieve locomotion, manipulation, or balancing objectives. The physics simulation handles contact dynamics, friction, joint constraints, and tendon mechanics with high numerical accuracy.
The environment suite encompasses a diverse set of morphologies: hopping (Hopper), bipedal walking (Walker2d, Humanoid, HumanoidStandup), quadrupedal locomotion (Ant), swimming (Swimmer), running (HalfCheetah), and manipulation tasks (Reacher, Pusher). Each environment exposes high-dimensional continuous observation spaces (joint positions, velocities, and external forces) and continuous action spaces (joint torques). The environments are available in both v4 and v5 variants, with v5 providing improved configurability, updated default parameters, and better alignment with the underlying MuJoCo model specifications.
These environments constitute the most widely used continuous control benchmarks in the RL literature. Virtually every major policy gradient, actor-critic, and model-based RL algorithm has been evaluated on MuJoCo tasks, making them the de facto standard for comparing continuous control performance. Their computational efficiency, deterministic dynamics, and well-defined reward structures make them suitable for large-scale empirical studies.
Usage
Use MuJoCo locomotion environments for developing, benchmarking, and comparing continuous control RL algorithms. They are the standard choice for evaluating policy optimization methods such as PPO, SAC, TD3, and TRPO. Use v5 environments for new research to benefit from improved defaults and configurability. These environments are appropriate when testing algorithms on moderate to high-dimensional continuous action spaces with realistic physical dynamics.
Theoretical Basis
MuJoCo solves the forward dynamics equation for articulated bodies:
where is the joint-space inertia matrix, is the vector of generalized coordinates, represents Coriolis and gravitational forces, is the vector of applied torques (actions), is the constraint Jacobian transpose, and is the vector of contact and constraint forces.
The standard reward formulation for locomotion tasks follows:
where is forward velocity, penalizes large actions, penalizes contact forces, and is a survival bonus.
action = clip(agent_action, -1, 1) * torque_scale
mujoco.mj_step(model, data) # advance simulation
obs = concatenate(qpos, qvel, cfrc_ext) # build observation
reward = forward_velocity - ctrl_cost - contact_cost + healthy_reward
terminated = not is_healthy(data)
The simulation uses a semi-implicit Euler integrator with configurable timestep (typically 0.01s for inner steps, with a frameskip of 5 yielding 0.05s per agent step).
Related Pages
- Implementation:Farama_Foundation_Gymnasium_AntEnv_V4
- Implementation:Farama_Foundation_Gymnasium_AntEnv_V5
- Implementation:Farama_Foundation_Gymnasium_HalfCheetahEnv_V4
- Implementation:Farama_Foundation_Gymnasium_HalfCheetahEnv_V5
- Implementation:Farama_Foundation_Gymnasium_HopperEnv_V4
- Implementation:Farama_Foundation_Gymnasium_HopperEnv_V5
- Implementation:Farama_Foundation_Gymnasium_HumanoidEnv_V4
- Implementation:Farama_Foundation_Gymnasium_HumanoidEnv_V5
- Implementation:Farama_Foundation_Gymnasium_HumanoidStandupEnv_V4
- Implementation:Farama_Foundation_Gymnasium_HumanoidStandupEnv_V5
- Implementation:Farama_Foundation_Gymnasium_InvertedDoublePendulumEnv_V4
- Implementation:Farama_Foundation_Gymnasium_InvertedDoublePendulumEnv_V5
- Implementation:Farama_Foundation_Gymnasium_InvertedPendulumEnv_V5
- Implementation:Farama_Foundation_Gymnasium_PusherEnv_V4
- Implementation:Farama_Foundation_Gymnasium_PusherEnv_V5
- Implementation:Farama_Foundation_Gymnasium_ReacherEnv_V4
- Implementation:Farama_Foundation_Gymnasium_ReacherEnv_V5
- Implementation:Farama_Foundation_Gymnasium_SwimmerEnv_V4
- Implementation:Farama_Foundation_Gymnasium_SwimmerEnv_V5
- Implementation:Farama_Foundation_Gymnasium_Walker2dEnv_V4
- Implementation:Farama_Foundation_Gymnasium_Walker2dEnv_V5
- Implementation:Farama_Foundation_Gymnasium_MuJoCo_Utils