Implementation:Google deepmind Dm control Suite LQR
| Metadata | Value |
|---|---|
| Implementation | Suite LQR |
| Domain | Reinforcement_Learning, Control |
| Source | Google_deepmind_Dm_control |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Concrete tool for procedurally generated Linear Quadratic Regulator (LQR) control tasks provided by the dm_control Control Suite.
Description
The LQR domain generates procedural spring-damper chain environments that produce linear dynamics, making them amenable to exact solution via the discrete algebraic Riccati equation. The _make_model function dynamically constructs MJCF XML strings by chaining together bodies connected by joints with randomly sampled stiffness and damping values. Actuators are attached to the first n_actuators bodies, and spatial tendons are added between consecutive bodies for visualisation.
The Physics subclass adds a single method, state_norm, which returns the L2 norm of the full physics state (positions and velocities). The LQRLevel task class initializes each episode with a random state sampled from a unit sphere (scaled by sqrt(2)). The reward is a quadratic cost: 1 - (0.5 * ||positions||^2 + control_cost_coef * 0.5 * ||controls||^2). The task terminates early when the state norm falls below a tolerance of 1e-6.
Two task configurations are registered: lqr_2_1 (2 bodies, 1 actuator) and lqr_6_2 (6 bodies, 2 actuators). Both use a control cost coefficient of 0.1 and an infinite default time limit. The get_evaluation method provides a sparse evaluation metric that returns 1 when the state norm is below 0.01.
Usage
Use this implementation for benchmarking control algorithms on systems with known optimal solutions. Load via suite.load(domain_name='lqr', task_name='lqr_2_1') or suite.load(domain_name='lqr', task_name='lqr_6_2'). The companion lqr_solver module can compute the optimal policy for comparison.
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/lqr.py
- Lines: 1-267
Signature
# Task factory functions
def lqr_2_1(time_limit=float('inf'), random=None, environment_kwargs=None)
def lqr_6_2(time_limit=float('inf'), random=None, environment_kwargs=None)
# Internal helpers
def _make_lqr(n_bodies, n_actuators, control_cost_coef, time_limit, random,
environment_kwargs)
def _make_model(n_bodies, n_actuators, random,
stiffness_range=(15, 25), damping_range=(0, 0))
def _make_body(body_id, stiffness_range, damping_range, random)
def get_model_and_assets(n_bodies, n_actuators, random)
# Physics subclass
class Physics(mujoco.Physics):
def state_norm(self) # L2 norm of the full physics state
# Task class
class LQRLevel(base.Task):
def __init__(self, control_cost_coef, random=None)
def control_cost_coef # property
def initialize_episode(self, physics)
def get_observation(self, physics)
def get_reward(self, physics)
def get_evaluation(self, physics)
def get_termination(self, physics)
Import
from dm_control import suite
env = suite.load(domain_name='lqr', task_name='lqr_2_1')
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
time_limit |
float | No | Maximum episode duration in seconds (default infinity). |
random |
int, numpy.random.RandomState, or None | No | Random seed or RNG instance for reproducibility and model generation. |
environment_kwargs |
dict or None | No | Additional keyword arguments forwarded to the Environment constructor.
|
Outputs
| Name | Type | Description |
|---|---|---|
| environment | dm_control.rl.control.Environment |
A fully initialised environment conforming to the dm_env.Environment interface.
|
Observations
| Key | Type | Description |
|---|---|---|
position |
numpy array | Generalized positions of all bodies. |
velocity |
numpy array | Generalized velocities of all bodies. |
Usage Examples
from dm_control import suite
# Load the 2-body, 1-actuator LQR task
env = suite.load(domain_name='lqr', task_name='lqr_2_1')
# Run an episode
time_step = env.reset()
while not time_step.last():
action = env.action_spec().generate_value()
time_step = env.step(action)
# Load the 6-body, 2-actuator variant
env_large = suite.load(domain_name='lqr', task_name='lqr_6_2')
# Compare with the optimal policy using lqr_solver
from dm_control.suite import lqr_solver
p, k, beta = lqr_solver.solve(env_large)