Implementation:Google deepmind Dm control LQR Solver
| Metadata | Value |
|---|---|
| Implementation | LQR Solver |
| Domain | Reinforcement_Learning, Control |
| Source | Google_deepmind_Dm_control |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Concrete tool for computing the optimal value function and linear policy for LQR environments provided by the dm_control Control Suite.
Description
The LQR Solver module computes the analytical optimal solution for the infinite-horizon discrete-time Linear Quadratic Regulator problem. Given an LQR environment, the solve function extracts the system dynamics matrices (state transition matrix a, control transition matrix b, state cost Hessian q, and control cost Hessian r) from the MuJoCo model parameters (mass matrix, joint stiffness, damping, and timestep).
The function then solves the discrete algebraic Riccati equation (DARE) using scipy.linalg.solve_discrete_are to obtain the optimal cost-to-go Hessian p. From this, it derives the optimal linear feedback gain matrix k such that the optimal control is u = k * x. The function also computes beta, the maximum eigenvalue of the closed-loop transition matrix (a + b * k), which characterizes the convergence rate of the controlled system. If beta >= 1.0, the controlled system is unstable and a RuntimeError is raised.
This solver is designed specifically for the procedurally generated LQR environments in the lqr domain module. It provides a ground-truth baseline for evaluating learned control policies.
Usage
Use this implementation to obtain the optimal policy for LQR environments as a reference baseline. Import directly as from dm_control.suite import lqr_solver and call lqr_solver.solve(env).
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/lqr_solver.py
- Lines: 1-81
Signature
def solve(env):
"""Returns the optimal value and policy for LQR problem.
Args:
env: An instance of control.EnvironmentV2 with LQR level.
Returns:
p: Hessian of the optimal cost-to-go, V(x) = .5 * x' * p * x.
k: Optimal linear policy gain matrix, u = k * x.
beta: Maximum eigenvalue of (a + b * k), convergence rate.
Raises:
RuntimeError: If the controlled system is unstable.
"""
Import
from dm_control.suite import lqr_solver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
env |
dm_control.rl.control.Environment |
Yes | An LQR environment instance (created via the lqr domain).
|
Outputs
| Name | Type | Description |
|---|---|---|
p |
numpy array (2n, 2n) | Hessian of the optimal total cost-to-go: V(x) = 0.5 * x' * p * x.
|
k |
numpy array (m, 2n) | Optimal linear feedback gain matrix: u = k * x.
|
beta |
float | Maximum eigenvalue of closed-loop transition matrix; state converges to 0 like beta^n.
|
Exceptions
| Exception | Condition |
|---|---|
RuntimeError |
Raised if the controlled system is unstable (beta >= 1.0).
|
Usage Examples
from dm_control import suite
from dm_control.suite import lqr_solver
import numpy as np
# Load an LQR environment
env = suite.load(domain_name='lqr', task_name='lqr_2_1')
# Compute the optimal policy
p, k, beta = lqr_solver.solve(env)
print(f"Convergence rate (beta): {beta:.4f}")
print(f"Optimal gain matrix shape: {k.shape}")
# Use the optimal policy
time_step = env.reset()
while not time_step.last():
state = np.concatenate([
time_step.observation['position'],
time_step.observation['velocity']
])
action = k.dot(state)
time_step = env.step(action)