Implementation:Google deepmind Dm control Suite Fish

Metadata	Value
Implementation	Suite Fish
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for controlling a 3D fish to remain upright or swim to a target provided by the dm_control Control Suite.

Description

The Fish domain simulates a 3D fish with a segmented tail and two fins operating in a fluid-like environment. The fish has seven actuated joints: two tail segments, a tail twist, and roll/pitch controls for the left and right fins. The Physics subclass provides methods for reading the torso uprightness (z-axis projection), torso velocity from sensors, joint angles and velocities, and the vector from the fish's mouth to a target position in the mouth's local coordinate frame.

Two benchmarking tasks are registered: upright and swim. The Upright task initializes the fish with a random orientation (random quaternion) and random joint angles in [-0.2, 0.2], hides the target geometry, and rewards the agent with a smooth tolerance-based signal for keeping the torso z-axis aligned with the world z-axis. The Swim task additionally randomizes a target position in the workspace and provides a composite reward that weighs proximity to the target (7/8) and uprightness (1/8).

Both tasks use a default time limit of 40 seconds and a control timestep of 0.04 seconds.

Usage

Use this implementation for 3D locomotion and orientation control benchmarks in a fluid environment. Load via suite.load(domain_name='fish', task_name='upright') or suite.load(domain_name='fish', task_name='swim').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/fish.py
Lines: 1-172

Signature

# Task factory functions
def upright(time_limit=40, random=None, environment_kwargs=None)
def swim(time_limit=40, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def upright(self)            # z-axis projection of torso to world z
    def torso_velocity(self)     # velocities and angular velocities from sensors
    def joint_velocities(self)   # velocities of the 7 actuated joints
    def joint_angles(self)       # positions of the 7 actuated joints
    def mouth_to_target(self)    # vector from mouth to target in mouth frame

# Task classes
class Upright(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

class Swim(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='fish', task_name='swim')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 40).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`joint_angles`	numpy array (7,)	Positions of the 7 actuated joints.
`upright`	float	Projection from torso z-axis to world z-axis.
`velocity`	numpy array	Joint velocities.
`target`	numpy array (3,)	Vector from mouth to target in mouth local frame (Swim task only).

Usage Examples

from dm_control import suite

# Load the upright task
env = suite.load(domain_name='fish', task_name='upright')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the swim task
env_swim = suite.load(domain_name='fish', task_name='swim')

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment