Implementation:Farama Foundation Gymnasium AntEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Ant v5 MuJoCo locomotion environment provided by Gymnasium.
Description
The Ant environment is based on Schulman, Moritz, Levine, Jordan, and Abbeel's work in "High-Dimensional Continuous Control Using Generalized Advantage Estimation". The ant is a 3D quadruped robot consisting of a torso (free rotational body) with four legs attached to it, where each leg has two body parts. The goal is to coordinate the four legs to move in the forward (right) direction by applying torque to the eight hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, a control cost penalty, and a contact cost penalty. The episode terminates if the ant becomes unhealthy (z-coordinate outside the healthy range or non-finite state values).
Usage
Use this environment for benchmarking continuous control and locomotion reinforcement learning algorithms. It is commonly used in research papers for evaluating policy gradient methods, model-based RL, and hierarchical RL approaches. The v5 version adds support for custom MuJoCo XML models, configurable frame skip, and improved reward/observation handling over v4.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/ant_v5.py
Signature
class AntEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "ant.xml",
frame_skip: int = 5,
default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
forward_reward_weight: float = 1,
ctrl_cost_weight: float = 0.5,
contact_cost_weight: float = 5e-4,
healthy_reward: float = 1.0,
main_body: int | str = 1,
terminate_when_unhealthy: bool = True,
healthy_z_range: tuple[float, float] = (0.2, 1.0),
contact_force_range: tuple[float, float] = (-1.0, 1.0),
reset_noise_scale: float = 0.1,
exclude_current_positions_from_observation: bool = True,
include_cfrc_ext_in_observation: bool = True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Ant-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (8,) | Yes | Torques applied to the 8 hinge joints connecting leg segments, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (105,) | State vector: qpos (13 elements), qvel (14 elements), cfrc_ext (78 elements); x,y excluded by default |
| reward | float | healthy_reward + forward_reward - ctrl_cost - contact_cost |
| terminated | bool | True if ant is unhealthy (z outside [0.2, 1.0] or non-finite state) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains x_position, y_position, distance_from_origin, x_velocity, y_velocity, reward_forward, reward_ctrl, reward_contact, reward_survive |
Usage Examples
import gymnasium as gym
env = gym.make("Ant-v5")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()