Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium AntEnv V4

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Ant v4 MuJoCo locomotion environment provided by Gymnasium.

Description

The Ant v4 environment implements the 3D quadruped ant robot using MuJoCo bindings (mujoco >= 2.1.3). The ant has a torso with four legs, each consisting of two body parts connected by hinge joints. The goal is to move forward by applying torques to the eight hinges. The reward includes forward velocity, a healthy survival reward, and a control cost penalty. Contact forces can optionally be included in the observation space via the use_contact_forces parameter. This is the legacy version; v5 is recommended for new projects.

Usage

Use this environment for reproducing results from papers that used Ant-v4 specifically. For new research, consider using Ant-v5 which fixes several bugs and provides more configuration options. Note that v4 has a known issue where healthy_reward is given on every step even when unhealthy.

Code Reference

Source Location

Signature

class AntEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file="ant.xml",
        ctrl_cost_weight=0.5,
        use_contact_forces=False,
        contact_cost_weight=5e-4,
        healthy_reward=1.0,
        terminate_when_unhealthy=True,
        healthy_z_range=(0.2, 1.0),
        contact_force_range=(-1.0, 1.0),
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Ant-v4")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (8,) Yes Torques applied to the 8 hinge joints connecting leg segments, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (27,) State vector: qpos (13, x/y excluded), qvel (14); optionally +84 for contact forces if use_contact_forces=True
reward float forward_reward + healthy_reward - ctrl_cost (optionally - contact_cost)
terminated bool True if ant is unhealthy (z outside [0.2, 1.0] or non-finite state)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains reward_forward, reward_ctrl, reward_survive, x_position, y_position, distance_from_origin, x_velocity, y_velocity, forward_reward

Usage Examples

import gymnasium as gym

env = gym.make("Ant-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment