Implementation:Farama Foundation Gymnasium AntEnv V4
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Ant v4 MuJoCo locomotion environment provided by Gymnasium.
Description
The Ant v4 environment implements the 3D quadruped ant robot using MuJoCo bindings (mujoco >= 2.1.3). The ant has a torso with four legs, each consisting of two body parts connected by hinge joints. The goal is to move forward by applying torques to the eight hinges. The reward includes forward velocity, a healthy survival reward, and a control cost penalty. Contact forces can optionally be included in the observation space via the use_contact_forces parameter. This is the legacy version; v5 is recommended for new projects.
Usage
Use this environment for reproducing results from papers that used Ant-v4 specifically. For new research, consider using Ant-v5 which fixes several bugs and provides more configuration options. Note that v4 has a known issue where healthy_reward is given on every step even when unhealthy.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/ant_v4.py
Signature
class AntEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file="ant.xml",
ctrl_cost_weight=0.5,
use_contact_forces=False,
contact_cost_weight=5e-4,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_z_range=(0.2, 1.0),
contact_force_range=(-1.0, 1.0),
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Ant-v4")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (8,) | Yes | Torques applied to the 8 hinge joints connecting leg segments, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (27,) | State vector: qpos (13, x/y excluded), qvel (14); optionally +84 for contact forces if use_contact_forces=True |
| reward | float | forward_reward + healthy_reward - ctrl_cost (optionally - contact_cost) |
| terminated | bool | True if ant is unhealthy (z outside [0.2, 1.0] or non-finite state) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains reward_forward, reward_ctrl, reward_survive, x_position, y_position, distance_from_origin, x_velocity, y_velocity, forward_reward |
Usage Examples
import gymnasium as gym
env = gym.make("Ant-v4")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()