Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite LQR

From Leeroopedia
Metadata Value
Implementation Suite LQR
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for procedurally generated Linear Quadratic Regulator (LQR) control tasks provided by the dm_control Control Suite.

Description

The LQR domain generates procedural spring-damper chain environments that produce linear dynamics, making them amenable to exact solution via the discrete algebraic Riccati equation. The _make_model function dynamically constructs MJCF XML strings by chaining together bodies connected by joints with randomly sampled stiffness and damping values. Actuators are attached to the first n_actuators bodies, and spatial tendons are added between consecutive bodies for visualisation.

The Physics subclass adds a single method, state_norm, which returns the L2 norm of the full physics state (positions and velocities). The LQRLevel task class initializes each episode with a random state sampled from a unit sphere (scaled by sqrt(2)). The reward is a quadratic cost: 1 - (0.5 * ||positions||^2 + control_cost_coef * 0.5 * ||controls||^2). The task terminates early when the state norm falls below a tolerance of 1e-6.

Two task configurations are registered: lqr_2_1 (2 bodies, 1 actuator) and lqr_6_2 (6 bodies, 2 actuators). Both use a control cost coefficient of 0.1 and an infinite default time limit. The get_evaluation method provides a sparse evaluation metric that returns 1 when the state norm is below 0.01.

Usage

Use this implementation for benchmarking control algorithms on systems with known optimal solutions. Load via suite.load(domain_name='lqr', task_name='lqr_2_1') or suite.load(domain_name='lqr', task_name='lqr_6_2'). The companion lqr_solver module can compute the optimal policy for comparison.

Code Reference

Source Location

Signature

# Task factory functions
def lqr_2_1(time_limit=float('inf'), random=None, environment_kwargs=None)
def lqr_6_2(time_limit=float('inf'), random=None, environment_kwargs=None)

# Internal helpers
def _make_lqr(n_bodies, n_actuators, control_cost_coef, time_limit, random,
              environment_kwargs)
def _make_model(n_bodies, n_actuators, random,
                stiffness_range=(15, 25), damping_range=(0, 0))
def _make_body(body_id, stiffness_range, damping_range, random)
def get_model_and_assets(n_bodies, n_actuators, random)

# Physics subclass
class Physics(mujoco.Physics):
    def state_norm(self)   # L2 norm of the full physics state

# Task class
class LQRLevel(base.Task):
    def __init__(self, control_cost_coef, random=None)
    def control_cost_coef      # property
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)
    def get_evaluation(self, physics)
    def get_termination(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='lqr', task_name='lqr_2_1')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default infinity).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility and model generation.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
position numpy array Generalized positions of all bodies.
velocity numpy array Generalized velocities of all bodies.

Usage Examples

from dm_control import suite

# Load the 2-body, 1-actuator LQR task
env = suite.load(domain_name='lqr', task_name='lqr_2_1')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the 6-body, 2-actuator variant
env_large = suite.load(domain_name='lqr', task_name='lqr_6_2')

# Compare with the optimal policy using lqr_solver
from dm_control.suite import lqr_solver
p, k, beta = lqr_solver.solve(env_large)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment