Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Fish

From Leeroopedia
Metadata Value
Implementation Suite Fish
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for controlling a 3D fish to remain upright or swim to a target provided by the dm_control Control Suite.

Description

The Fish domain simulates a 3D fish with a segmented tail and two fins operating in a fluid-like environment. The fish has seven actuated joints: two tail segments, a tail twist, and roll/pitch controls for the left and right fins. The Physics subclass provides methods for reading the torso uprightness (z-axis projection), torso velocity from sensors, joint angles and velocities, and the vector from the fish's mouth to a target position in the mouth's local coordinate frame.

Two benchmarking tasks are registered: upright and swim. The Upright task initializes the fish with a random orientation (random quaternion) and random joint angles in [-0.2, 0.2], hides the target geometry, and rewards the agent with a smooth tolerance-based signal for keeping the torso z-axis aligned with the world z-axis. The Swim task additionally randomizes a target position in the workspace and provides a composite reward that weighs proximity to the target (7/8) and uprightness (1/8).

Both tasks use a default time limit of 40 seconds and a control timestep of 0.04 seconds.

Usage

Use this implementation for 3D locomotion and orientation control benchmarks in a fluid environment. Load via suite.load(domain_name='fish', task_name='upright') or suite.load(domain_name='fish', task_name='swim').

Code Reference

Source Location

Signature

# Task factory functions
def upright(time_limit=40, random=None, environment_kwargs=None)
def swim(time_limit=40, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def upright(self)            # z-axis projection of torso to world z
    def torso_velocity(self)     # velocities and angular velocities from sensors
    def joint_velocities(self)   # velocities of the 7 actuated joints
    def joint_angles(self)       # positions of the 7 actuated joints
    def mouth_to_target(self)    # vector from mouth to target in mouth frame

# Task classes
class Upright(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

class Swim(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='fish', task_name='swim')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default 40).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
joint_angles numpy array (7,) Positions of the 7 actuated joints.
upright float Projection from torso z-axis to world z-axis.
velocity numpy array Joint velocities.
target numpy array (3,) Vector from mouth to target in mouth local frame (Swim task only).

Usage Examples

from dm_control import suite

# Load the upright task
env = suite.load(domain_name='fish', task_name='upright')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the swim task
env_swim = suite.load(domain_name='fish', task_name='swim')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment