Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium AntEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Ant v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Ant environment is based on Schulman, Moritz, Levine, Jordan, and Abbeel's work in "High-Dimensional Continuous Control Using Generalized Advantage Estimation". The ant is a 3D quadruped robot consisting of a torso (free rotational body) with four legs attached to it, where each leg has two body parts. The goal is to coordinate the four legs to move in the forward (right) direction by applying torque to the eight hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, a control cost penalty, and a contact cost penalty. The episode terminates if the ant becomes unhealthy (z-coordinate outside the healthy range or non-finite state values).

Usage

Use this environment for benchmarking continuous control and locomotion reinforcement learning algorithms. It is commonly used in research papers for evaluating policy gradient methods, model-based RL, and hierarchical RL approaches. The v5 version adds support for custom MuJoCo XML models, configurable frame skip, and improved reward/observation handling over v4.

Code Reference

Source Location

Signature

class AntEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "ant.xml",
        frame_skip: int = 5,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        forward_reward_weight: float = 1,
        ctrl_cost_weight: float = 0.5,
        contact_cost_weight: float = 5e-4,
        healthy_reward: float = 1.0,
        main_body: int | str = 1,
        terminate_when_unhealthy: bool = True,
        healthy_z_range: tuple[float, float] = (0.2, 1.0),
        contact_force_range: tuple[float, float] = (-1.0, 1.0),
        reset_noise_scale: float = 0.1,
        exclude_current_positions_from_observation: bool = True,
        include_cfrc_ext_in_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Ant-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (8,) Yes Torques applied to the 8 hinge joints connecting leg segments, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (105,) State vector: qpos (13 elements), qvel (14 elements), cfrc_ext (78 elements); x,y excluded by default
reward float healthy_reward + forward_reward - ctrl_cost - contact_cost
terminated bool True if ant is unhealthy (z outside [0.2, 1.0] or non-finite state)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, y_position, distance_from_origin, x_velocity, y_velocity, reward_forward, reward_ctrl, reward_contact, reward_survive

Usage Examples

import gymnasium as gym

env = gym.make("Ant-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment