Implementation:Farama Foundation Gymnasium SwimmerEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Swimmer v5 MuJoCo locomotion environment provided by Gymnasium.
Description
The Swimmer v5 environment corresponds to the Swimmer described in Remi Coulom's PhD thesis "Reinforcement Learning Using Neural Networks, with Applications to Motor Control". The default swimmer consists of three segments (links) connected by two rotor joints. The swimmer is suspended in a 2D pool and the goal is to move as fast as possible to the right by applying torque to the rotors and exploiting fluid friction. The reward is forward_reward - ctrl_cost. The Swimmer never terminates; episodes end through truncation. The v5 version adds support for custom MuJoCo XML models, configurable frame_skip, observation_structure, non-empty reset info, and consistent reward naming (reward_forward instead of reward_fwd).
Usage
Use this environment for benchmarking RL algorithms on a swimming locomotion task. The Swimmer is notable for testing algorithms in environments with simple dynamics but challenging reward landscapes. The v5 version is recommended for new research.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/swimmer_v5.py
Signature
class SwimmerEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "swimmer.xml",
frame_skip: int = 4,
default_camera_config: dict[str, float | int] = None,
forward_reward_weight: float = 1.0,
ctrl_cost_weight: float = 1e-4,
reset_noise_scale: float = 0.1,
exclude_current_positions_from_observation: bool = True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Swimmer-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (2,) | Yes | Torques applied to the two rotor joints, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (8,) | State vector: qpos (3 angles, x/y excluded by default), qvel (5 velocities) |
| reward | float | forward_reward - ctrl_cost |
| terminated | bool | Always False (Swimmer never terminates) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains x_position, y_position, distance_from_origin, x_velocity, y_velocity, reward_forward, reward_ctrl |
Usage Examples
import gymnasium as gym
env = gym.make("Swimmer-v5")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()