Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium SwimmerEnv V4

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Swimmer v4 MuJoCo locomotion environment provided by Gymnasium.

Description

The Swimmer v4 environment implements a multi-segment swimmer in a 2D pool using MuJoCo bindings (mujoco >= 2.1.3). The default swimmer consists of three links and two rotor joints. The swimmer is suspended in a pool and the goal is to move as fast as possible towards the right by applying torque to the rotors and using fluid friction. The reward is forward_reward - ctrl_cost. The Swimmer never terminates; episodes end through truncation. The observation includes body angles (3 or 5 elements depending on exclude_current_positions_from_observation) and velocities (5 elements).

Usage

Use this environment for reproducing results from papers that used Swimmer-v4. For new research, consider Swimmer-v5 which adds support for custom XML models, configurable frame_skip, observation_structure, and consistent reward naming.

Code Reference

Source Location

Signature

class SwimmerEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        forward_reward_weight=1.0,
        ctrl_cost_weight=1e-4,
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Swimmer-v4")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (2,) Yes Torques applied to the two rotor joints, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (8,) State vector: qpos (3 angles, x/y excluded by default), qvel (5 velocities)
reward float forward_reward - ctrl_cost
terminated bool Always False (Swimmer never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains reward_fwd, reward_ctrl, x_position, y_position, distance_from_origin, x_velocity, y_velocity, forward_reward

Usage Examples

import gymnasium as gym

env = gym.make("Swimmer-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment