Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium SwimmerEnv V5

From Leeroopedia
Revision as of 12:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Farama_Foundation_Gymnasium_SwimmerEnv_V5.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Swimmer v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Swimmer v5 environment corresponds to the Swimmer described in Remi Coulom's PhD thesis "Reinforcement Learning Using Neural Networks, with Applications to Motor Control". The default swimmer consists of three segments (links) connected by two rotor joints. The swimmer is suspended in a 2D pool and the goal is to move as fast as possible to the right by applying torque to the rotors and exploiting fluid friction. The reward is forward_reward - ctrl_cost. The Swimmer never terminates; episodes end through truncation. The v5 version adds support for custom MuJoCo XML models, configurable frame_skip, observation_structure, non-empty reset info, and consistent reward naming (reward_forward instead of reward_fwd).

Usage

Use this environment for benchmarking RL algorithms on a swimming locomotion task. The Swimmer is notable for testing algorithms in environments with simple dynamics but challenging reward landscapes. The v5 version is recommended for new research.

Code Reference

Source Location

Signature

class SwimmerEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "swimmer.xml",
        frame_skip: int = 4,
        default_camera_config: dict[str, float | int] = None,
        forward_reward_weight: float = 1.0,
        ctrl_cost_weight: float = 1e-4,
        reset_noise_scale: float = 0.1,
        exclude_current_positions_from_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Swimmer-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (2,) Yes Torques applied to the two rotor joints, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (8,) State vector: qpos (3 angles, x/y excluded by default), qvel (5 velocities)
reward float forward_reward - ctrl_cost
terminated bool Always False (Swimmer never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, y_position, distance_from_origin, x_velocity, y_velocity, reward_forward, reward_ctrl

Usage Examples

import gymnasium as gym

env = gym.make("Swimmer-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment